Executing the real-time pipeline - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Start my 1-month free trial Buy for my team

Executing the real-time pipeline

“

- [Instructor] The realtime pipeline writes to three different destinations, namely Kafka, MariaDB, and Redis. We need to monitor these destinations to make sure that they are updated while the pipeline is running. We will do so in this notebook. This code is available in code 04_05_Spark_BDE_Execute_the_pipeline. We start with the Kafka topic that has the abandoned cards. We subscribe to that topic. Next, we connect to Redis to read the country stats. Finally, we connect to MariaDB to get the website stats. We get the total duration by last action using the summary stat SQL. Now we go into a monitoring loop. We first fetch the Kafka messages in the abandoned cards topic, and print them. Then we get the current scores in Redis, and print them. Finally, we read the last action stats from the MariaDB database, and print them. We go to sleep for five seconds and repeat the loop. Let's run the demo now. We first start…

- (Locked)
  
  More about Apache Spark
  
  43s

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Executing the real-time pipeline - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Executing the real-time pipeline

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Explore Business Topics

Explore Creative Topics

Explore Technology Topics