From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Course prerequisites

Course prerequisites

- [Instructor] Before we begin this course, I want to discuss the prerequisite skills needed for the students to maximize learning from this course. The focus of this course is to help students build data engineering pipelines with Apache Spark. It will discuss some key design principles and best practices in building pipelines and demonstrate them with examples. The students are expected to be familiar with the basics of Apache Spark and are able to set up code and deploy applications with Spark. Familiarity in structured streaming and SQL capabilities for Apache Spark is also desired. The example code is in Python, so familiarity with Python concepts, programming, and using Jupyter notebooks is required. We will build pipelines using third party data stores, namely Kafka, MariaDB, and Redis. Familiarity in these data stores is also helpful. We will be deploying and using these data stores with Docker, so knowledge of Docker operations is also essential. Let's now get set up with the…

Contents