From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Real-time use case: Design

Real-time use case: Design

- [Instructor] What does the design for the real-time website analytics use case look like? We have an e-commerce application that is running in the cloud data center. The application creates user visit records when the user exits the application and publishes them to a Kafka queue called spark.streaming.website.visits. It's possible that the application is located in multiple data centers across the globe. Even in such cases, the data is streamed into a single central Kafka queue. An Apache Spark job called Website Analytics runs and consumes the visit records in real time from the Kafka queue. On the data that is received, it will execute multiple actions. First, it computes five second summaries and inserts them into a MariaDB database called website_stats. A table called visit_stats is used to capture that information. Next, it maintains a running counter of the total duration by country using a Redis sorted set. Finally, it filters those visits, which ended in the shopping cart…

Contents