From the course: Python Data Analysis

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Loading name datasets

Loading name datasets - Python Tutorial

From the course: Python Data Analysis

Loading name datasets

- [Instructor] You can download the social security name dataset from their website. We'll use the national data file, which gives you a Zip archive containing files for each year since 1880. I have already unpacked them in your exercise files in this chapter's directory under names. However, you could unzip a file in Python using the zip file standard library module. Jupyter lets us browse the contents of the names directory. Let's see, what are these files like? We open one of them in read mode and print out the very first few lines. It's a very simple comma-separated format, name, sex, and the number of babies born that year with that name. Pandas read_csv shouldn't have any problem, except that the CSV reader used the first record, Olivia, to name the columns. We will instead set the column names explicitly, better. We will load all the tables and concatenate them into a single data frame. To avoid confusing data from different years, we can prepare the individual data frames by…

Contents