From the course: Python Data Analysis
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Cleaning data - Python Tutorial
From the course: Python Data Analysis
Cleaning data
- [Instructor] In addition to simple ASCII tables, Pandas supports many other formats: JSON, using web applications, HTML and XML, which we may scrape directly from my website, Microsoft Excel spreadsheets, HDF, the hierarchical format for scientific data, the very efficient binary formats from the Apache Software Foundation, such as Feather and Parquet, proprietary statistic software formats such as SaaS, Stata, and SPSS, SQL databases, and finally, the internal binary Python format, Pickle. In some cases, you need to install other packages to support that functionality. I've indicated those packages in this table, but Pandas itself will tell you if you need them. This list is not exhaustive. Other formats are supported by third party packages, so it's always worth Googling or asking your favorite large language model. In this video, we'll concentrate on a few useful formats, but our considerations will apply more generally. Let's first talk about saving. Say we did the work of…