From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Spark analytics and ML

Spark analytics and ML

- [Instructor] Analytics and machine learning are two alike domains with data engineering, Apache Spark can also help extend data engineering pipelines to perform analytics and machine learning. Let's start with analytics. Spark supports Spark SQL, a simple, yet powerful SQL interface to perform computations and aggregations. SQL is internally-translated into distributed operations that can efficiently process large datasets. Spark SQL can be used on both batch and real-time streaming pipelines Spark SQL syntax mimics standard SQL. It's simple to use and yet powerful to transform, filter, and aggregate data in one single statement. SQL-based analytics can be added to the same Spark pipeline that does data engineering, so implementation and deployment becomes easier. Data pipelines can be cascaded with analytics, where the output of one operation can be passed on to downstream operations for further analytics and processing. Results of analytics can either be persisted to databases or…

Contents