From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Project exercise requirements

Project exercise requirements

- [Instructor] Having learned about data engineering with Apache Spark, let's implement them in an end-to-end project. This is an exercise for the students to try out by themselves and then refer to the solutions provided in the chapter. There are always multiple ways to achieve the same end result. The solution provided in this chapter is only one such possible result. Here is the problem statement for the exercise. In chapter four, streaming use case, we created a MariaDB table with five second summaries of the last action performed by website users. We are going to use this table as the input and then build a downstream pipeline to analyze significant actions and maintain a running count table of the significant actions. Let's review the detailed requirements for the exercise. The task is to run a job every day and extract last action records with a total duration of more than 15 seconds within a five second window. To make sure that you have executed the examples in chapter four…

Contents