From the course: Data Quality: Core Concepts
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Data contracts
From the course: Data Quality: Core Concepts
Data contracts
- [Instructor] Finally we have data contracts, which is a emerging category in the data space right now. I define data contracts as a data architecture pattern that extends software-driven collaboration to data teams, and enhances data quality through human-in-the-loop reviews. And it's similar to how these systems have improved code quality for product teams. A similar tool being GitHub. Data contracts are upstream and mainly focused on the transactional database and the replication into the analytical database. The way it does this, it codifies expectations of data such as schema, semantics, profiling of the data as code via contract spec typically a YAML file. It extracts metadata from databases, data catalogs, and lineage. It detects proposed changes to code and databases, and then compares the contract spec to the expectations of the collected metadata. Contracts enforce data quality through the CICD workflow where engineers make a poll request, it runs set tests, and then if the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.