From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Generating a visits data stream - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Generating a visits data stream
- [Instructor] To execute the website visit pipeline. We first need a streaming data source. Let's create an example data stream. The code for this chapter is in the notebook code 0403 Spark BDE generate a visit stream. We have already created the required Maria DB table and the Kafka topic for this chapter. When we ran the setup prerequisite script earlier in the course. In this notebook, we simulate a data stream and publish it into the Kafka topic. We begin by defining a string serializer. Then we create a Kafka producer. The bootstrap server is set to the port on which our Kafka instance is listening to. The key and value serializer are set to the string serializer. We then define a sample list of countries and a list of last actions. We then generate hundred sample visit records. For each record, we pick a random country and last action from the predefined list. We set the time to the current time. We also set the duration to a random value. We form adjacent record for the visit…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.