Batch vs. real-time processing - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Start my 1-month free trial Buy for my team

Batch vs. real-time processing

“

- [Instructor] When building data pipelines, one of the key decisions to make is whether the pipeline would be batch or real-time. We start with batch processing. In batch processing, we process data in batches. A batch is a collection of records with a well-defined size or window. We can either define batch sizes by the count of records or by a date/time range. In batch processing, the batch of input records is already ready and complete. This input does not change when processing is going on. This is a key differentiator between batch and realtime processing. Batch processing works on bounded datasets. The number of records in the batch do not change after the processing has started. Since batch processing usually waits for all the input from that batch to be ready, there is latency when beginning processing. Also, it may take some time to process all the records. The results of processing are usually available at the end of processing, but intermediate results can also be possible.…

- (Locked)
  
  More about Apache Spark
  
  43s

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Batch vs. real-time processing - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Batch vs. real-time processing

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Explore Business Topics

Explore Creative Topics

Explore Technology Topics