From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Spark execution plan - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Spark execution plan
- [Instructor] Spark execution plans play an important role in optimizing pipelines. When a job is submitted to Apache Spark, it first analyzes all the code given to it and comes up with an execution plan. Spark has an optimizer that analyzes the steps needed to process data and optimizes for performance and resource utilization. Spark only executes code when an action like reduce or collect is performed. At this point, the optimizer kicks in and analyzes all the previous steps required to achieve this action. It then comes up with a physical execution plan. The optimizer looks for reducing I/O, shuffling, and memory usage. If the data sources can support parallel I/O, then Spark accesses them directly from the executor and paralyzes these operations. This provides improved performance and reduces memory requirements on the driver. It is recommended to print and analyze execution plans to understand what Spark is doing underneath and look for opportunities to optimize them. Please…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.