From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Data engineering functions

Data engineering functions

- [Presenter] What are the functions of data engineering? Data engineering consists of five functions namely acquisition, transport, storage, processing, and serving. Data pipelines are built by combining multiple such functions to deliver a given outcome. Let's start with data acquisition. The goal of data acquisition is to extract or receive data from a data source and publish it into a big data pipeline. While acquiring data, the focus is on understanding the format of the source data and converting them into formats suitable for downstream processing. We work with interfaces provided by the source systems to extract data. Alternatively, we can expose interfaces through which clients can publish data into the pipeline. Security is also a major concern, especially if the data sources are not within the trust domain. Reliability of data is of importance, especially when acquiring from remote sources. Finally, latency is also a consideration for streaming data analytics. Data…

Contents