From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Course prerequisites - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Course prerequisites
- [Instructor] Before we begin this course, I want to discuss the prerequisite skills needed for the students to maximize learning from this course. The focus of this course is to help students build data engineering pipelines with Apache Spark. It will discuss some key design principles and best practices in building pipelines and demonstrate them with examples. The students are expected to be familiar with the basics of Apache Spark and are able to set up code and deploy applications with Spark. Familiarity in structured streaming and SQL capabilities for Apache Spark is also desired. The example code is in Python, so familiarity with Python concepts, programming, and using Jupyter notebooks is required. We will build pipelines using third party data stores, namely Kafka, MariaDB, and Redis. Familiarity in these data stores is also helpful. We will be deploying and using these data stores with Docker, so knowledge of Docker operations is also essential. Let's now get set up with the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.