From the course: Introduction to Data Engineering on AWS: Data Sourcing and Storage

Overview of data engineering

- [Instructor] The key to understanding what data engineering lies in the engineering part. Engineers design and build things. Data engineers design and build pipelines that transform and transport data into a format, wherein by the time it reaches to data scientists or other end users, it is in a highly usable estate. These pipelines must take data from many disparate sources and collect them into a single warehouse that represents the data uniformly as a single source of truth. Sounds simple enough, but a lot of data literacy skills goes into this role. Let's look at the data science hierarchy of needs. It depicts where data engineering falls in the roadmap to becoming a data science, or AI driven organization. It's worth noting, that the bottom level collect is growing larger and larger, thereby driving the need for more data engineers. A data engineer's role is at level two and three. Data engineers work in a variety of settings to build systems that collect, manage, and convert raw data into a usable information for data scientist and business analyst to interpret. Their ultimate goal is to make data accessible, so that organizations can use it to evaluate and optimize their performance. The term data engineering evolved to describe a role that moved away from using traditional ETL tools and developed its own tools to handle increasing volumes of data. As big data grew, data engineering came to describe a kind of software engineering, which includes intelligence of business, with logical analytics, that focused deeply on data. For example, data infrastructure, data warehousing, data mining, data modeling, data crunching, and meta data management.

Contents