From the course: Python Data Analysis

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Summarizing and visualizing categorical data

Summarizing and visualizing categorical data - Python Tutorial

From the course: Python Data Analysis

Summarizing and visualizing categorical data

- [Instructor] We move on to categorical variables, those that take a value from a finite, discrete set. How do we describe variation in categorical variables? Well, of course, with tables. We switch from Gapminder to the Whickham dataset discussed by Kaplan in his excellent textbook "Statistical Modeling." The table records interviews with women in Whickham, England in 1973 who were asked if they were smokers. The interviews were followed up 20 years later. When it was recorded, if the women were still alive. The categorical variables in this case, smoker and outcome, are both binary, yes or no. Here are the first five rows of the dataset. Using value counts, we can tally the explanatory and response variable separately. Smoker is the explanatory variable. Outcome is the response. This doesn't tell us much. Other than that all the groups are represented fairly well, smokers and non-smokers, women who survived for 20 years and those who didn't. If we want to see the values as…

Contents