From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Building a scorecard - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Building a scorecard
- [Instructor] Let's build a running scorecard for long last actions published to Kafka in the previous video. The code for this job is available in code 06_04_Spark_BDE_scorecard_for_last_action notebook. To help with managing the Redis sorted set, we also have the write_to_redis function. The Redis writer maintains a sorted set called long-last-action-stats on the Redis instance running on Docker. Here, it increments the count by one every time a last action occurs in the data stream. In the main job, we defined a Spark session. Since we only have one value in topic, last_action, we do not need to define a schema. Bury the topic directly into the raw_last_action dataframe. From here, we extract the value as last_action into the last_action dataframe. To maintain the Redis sorted set, we use the writeStream method and call the registrator for each record. This will update and maintain the Redis sorted set. We use the start method to start the Spark pipeline. In order to review the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.