From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Batch vs. streaming

Batch vs. streaming

- [Narrator] Data pipelines can be configured to consume data in batches at regular intervals or continuously in streams as data is generated. In this lesson, we're going to compare these two approaches. Batch processing data pipelines process and store data in large volumes or batches. The raw data accumulates for a certain period or until the predetermined batch size has been reached. Then the data is processed and stored where it is made available for analysis or machine learning. An example would be daily sales reports that summarize the previous day's orders. Batch pipelines are suitable when the desired latency can be measured in hours, days, or longer. Batch processing pipelines running in the cloud can be optimized for cost efficiency by running during off-peak hours and shutting down the compute resources when the processing is complete. Stream processing pipelines process data in a continuous, incremental sequence of small-sized data packets. It usually represents a series…

Contents