From the course: Data Quality: Core Concepts
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Data lineage
From the course: Data Quality: Core Concepts
Data lineage
- [Instructor] The next critical tool for managing data quality is data lineage. "Data lineage essentially communicates a data's origin, what has happened to it," and how it moves from source to source to provide visibility of your data system. Now, the definition didn't make too much sense to me at first until I saw actual representation of it. Here's the example of what data lineage is. This is a completely made up example. But you see you have your beginning sources, it highlights how it travels to a transformation stage, and then how they're merged together. Now, data lineage fits within the data lifecycle wherever data is being moved. And typically, you'll have it within data pipelines that show, "Hey, we're replicating from A to B," or within the analytical databases to identify the transformations. Now, data lineage could be in the data creation and the transactional phase, but many times the software engineers aren't too focused on the data aspect of it. So oftentimes, you'll…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.