Spark analytics and ML - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Start my 1-month free trial Buy for my team

Spark analytics and ML

“

- [Instructor] Analytics and machine learning are two alike domains with data engineering, Apache Spark can also help extend data engineering pipelines to perform analytics and machine learning. Let's start with analytics. Spark supports Spark SQL, a simple, yet powerful SQL interface to perform computations and aggregations. SQL is internally-translated into distributed operations that can efficiently process large datasets. Spark SQL can be used on both batch and real-time streaming pipelines Spark SQL syntax mimics standard SQL. It's simple to use and yet powerful to transform, filter, and aggregate data in one single statement. SQL-based analytics can be added to the same Spark pipeline that does data engineering, so implementation and deployment becomes easier. Data pipelines can be cascaded with analytics, where the output of one operation can be passed on to downstream operations for further analytics and processing. Results of analytics can either be persisted to databases or…

- (Locked)
  
  More about Apache Spark
  
  43s

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Spark analytics and ML - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Spark analytics and ML

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Explore Business Topics

Explore Creative Topics

Explore Technology Topics