Batch processing use case: Design - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Start my 1-month free trial Buy for my team

Batch processing use case: Design

“

- [Instructor] How does the "Batch Data Engineering: Use Case Design" look like? Let's consider three warehouse locations, namely London, New York, and Los Angeles. Each location has a MariaDB database called warehouse_stock. There is also a central data center to which the data needs to be uploaded. In order to store the uploaded data in the central data center, we need a file system that is scalable and provides concurrent access. We want to create folders for each warehouse in this file system and provide permissions to the local warehouses to upload their respective files daily. We will choose a distributed file system like HDFS or S3 that provides concurrent access, scaling, and security. Files will also be created in Parquet format that allows for parallel reads by Spark. Now, in each local data center, we will run a stock upload job built with Apache Spark. This job will run on a daily basis on a local Spark cluster. The job will read daily stock information from the…

- (Locked)
  
  More about Apache Spark
  
  43s

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Batch processing use case: Design - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Batch processing use case: Design

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Explore Business Topics

Explore Creative Topics

Explore Technology Topics