From the course: Python Data Analysis
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Loading name datasets - Python Tutorial
From the course: Python Data Analysis
Loading name datasets
- [Instructor] You can download the social security name dataset from their website. We'll use the national data file, which gives you a Zip archive containing files for each year since 1880. I have already unpacked them in your exercise files in this chapter's directory under names. However, you could unzip a file in Python using the zip file standard library module. Jupyter lets us browse the contents of the names directory. Let's see, what are these files like? We open one of them in read mode and print out the very first few lines. It's a very simple comma-separated format, name, sex, and the number of babies born that year with that name. Pandas read_csv shouldn't have any problem, except that the CSV reader used the first record, Olivia, to name the columns. We will instead set the column names explicitly, better. We will load all the tables and concatenate them into a single data frame. To avoid confusing data from different years, we can prepare the individual data frames by…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.