From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Real-time use case: Design - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Real-time use case: Design
- [Instructor] What does the design for the real-time website analytics use case look like? We have an e-commerce application that is running in the cloud data center. The application creates user visit records when the user exits the application and publishes them to a Kafka queue called spark.streaming.website.visits. It's possible that the application is located in multiple data centers across the globe. Even in such cases, the data is streamed into a single central Kafka queue. An Apache Spark job called Website Analytics runs and consumes the visit records in real time from the Kafka queue. On the data that is received, it will execute multiple actions. First, it computes five second summaries and inserts them into a MariaDB database called website_stats. A table called visit_stats is used to capture that information. Next, it maintains a running counter of the total duration by country using a Redis sorted set. Finally, it filters those visits, which ended in the shopping cart…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.