Generating a visits data stream - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Start my 1-month free trial Buy for my team

Generating a visits data stream

“

- [Instructor] To execute the website visit pipeline. We first need a streaming data source. Let's create an example data stream. The code for this chapter is in the notebook code 0403 Spark BDE generate a visit stream. We have already created the required Maria DB table and the Kafka topic for this chapter. When we ran the setup prerequisite script earlier in the course. In this notebook, we simulate a data stream and publish it into the Kafka topic. We begin by defining a string serializer. Then we create a Kafka producer. The bootstrap server is set to the port on which our Kafka instance is listening to. The key and value serializer are set to the string serializer. We then define a sample list of countries and a list of last actions. We then generate hundred sample visit records. For each record, we pick a random country and last action from the predefined list. We set the time to the current time. We also set the duration to a random value. We form adjacent record for the visit…

- (Locked)
  
  More about Apache Spark
  
  43s

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Generating a visits data stream - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Generating a visits data stream

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Explore Business Topics

Explore Creative Topics

Explore Technology Topics