From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Spark analytics and ML - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Spark analytics and ML
- [Instructor] Analytics and machine learning are two alike domains with data engineering, Apache Spark can also help extend data engineering pipelines to perform analytics and machine learning. Let's start with analytics. Spark supports Spark SQL, a simple, yet powerful SQL interface to perform computations and aggregations. SQL is internally-translated into distributed operations that can efficiently process large datasets. Spark SQL can be used on both batch and real-time streaming pipelines Spark SQL syntax mimics standard SQL. It's simple to use and yet powerful to transform, filter, and aggregate data in one single statement. SQL-based analytics can be added to the same Spark pipeline that does data engineering, so implementation and deployment becomes easier. Data pipelines can be cascaded with analytics, where the output of one operation can be passed on to downstream operations for further analytics and processing. Results of analytics can either be persisted to databases or…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.