From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Batch processing use case: Design - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Batch processing use case: Design
- [Instructor] How does the "Batch Data Engineering: Use Case Design" look like? Let's consider three warehouse locations, namely London, New York, and Los Angeles. Each location has a MariaDB database called warehouse_stock. There is also a central data center to which the data needs to be uploaded. In order to store the uploaded data in the central data center, we need a file system that is scalable and provides concurrent access. We want to create folders for each warehouse in this file system and provide permissions to the local warehouses to upload their respective files daily. We will choose a distributed file system like HDFS or S3 that provides concurrent access, scaling, and security. Files will also be created in Parquet format that allows for parallel reads by Spark. Now, in each local data center, we will run a stock upload job built with Apache Spark. This job will run on a daily basis on a local Spark cluster. The job will read daily stock information from the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.