From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Building a scorecard

Building a scorecard

- [Instructor] Let's build a running scorecard for long last actions published to Kafka in the previous video. The code for this job is available in code 06_04_Spark_BDE_scorecard_for_last_action notebook. To help with managing the Redis sorted set, we also have the write_to_redis function. The Redis writer maintains a sorted set called long-last-action-stats on the Redis instance running on Docker. Here, it increments the count by one every time a last action occurs in the data stream. In the main job, we defined a Spark session. Since we only have one value in topic, last_action, we do not need to define a schema. Bury the topic directly into the raw_last_action dataframe. From here, we extract the value as last_action into the last_action dataframe. To maintain the Redis sorted set, we use the writeStream method and call the registrator for each record. This will update and maintain the Redis sorted set. We use the start method to start the Spark pipeline. In order to review the…

Contents