From the course: ETL in Python and SQL

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Data quality checks and validation with SQL

Data quality checks and validation with SQL

From the course: ETL in Python and SQL

Data quality checks and validation with SQL

- [Instructor] In the last video, we loaded our transform data into our ElephantSQL database. We also learned how to create an SQLalchemy engine, which enables us to connect to our database. We ran the load command with pandas as well. In this video, we'll create data we have just loaded and learn about data quality checks and why they're important to ensure accuracy of your loaded data. As data moves from one place to the other, it's important to ensure data quality, completeness, and correctness. As your data pipeline becomes more complex, there will be a need to create tests that ensure that the data has not been truncated, lost, or corrupted during the ETL process. These checks and tests can be done in several ways. One of the most common ones include counting the number of rows and columns in the source versus the destination accounting for all the transformations that have taken place. Another good data accuracy check can also be checking for nulls, empty rows, duplicates, and…

Contents