From the course: ETL in Python and SQL
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Data quality checks and validation with SQL
From the course: ETL in Python and SQL
Data quality checks and validation with SQL
- [Instructor] In the last video, we loaded our transform data into our ElephantSQL database. We also learned how to create an SQLalchemy engine, which enables us to connect to our database. We ran the load command with pandas as well. In this video, we'll create data we have just loaded and learn about data quality checks and why they're important to ensure accuracy of your loaded data. As data moves from one place to the other, it's important to ensure data quality, completeness, and correctness. As your data pipeline becomes more complex, there will be a need to create tests that ensure that the data has not been truncated, lost, or corrupted during the ETL process. These checks and tests can be done in several ways. One of the most common ones include counting the number of rows and columns in the source versus the destination accounting for all the transformations that have taken place. Another good data accuracy check can also be checking for nulls, empty rows, duplicates, and…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
Introduction to data warehouses and data lakes5m 1s
-
(Locked)
Loading data into relational databases8m 1s
-
(Locked)
Data quality checks and validation with SQL3m 27s
-
(Locked)
Challenge: Transform the data and remove duplicates and nulls40s
-
(Locked)
Solution: Transform the data and remove duplicates and nulls2m 37s
-
(Locked)
-
-