Exploratory data analysis with Spark SQL

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial Buy for my team

Exploratory data analysis with Spark SQL

“

- [Instructor] Now, we saw how we could get things like the count the mean standard deviation using the DataFrame API. Let's do that with Spark SQL. And to do that, we'll specify Spark SQL, and then we'll give it a command. In this case it'll be SELECT; let's select min of CPU utilization and the max of CPU utilization and the standard deviation of CPU utilization. And we'll specify from utilization, because that's the table we specified with our create or replace tempview, and let's be sure to show this, because the result is a data frame. And so we have here our minimum CPU utilization is about 22%, max is 100%, and the standard deviation is about 15, which is what we saw up above, so no surprises there. I am going to just copy this command and we'll make it a little easier to read; I'm going to make this multi-line. I'll use this backslash, we'll say FROM utilization, and now we want to specify our group by…

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Exploratory data analysis with Spark SQL

From the course: Introduction to Spark SQL and DataFrames

Exploratory data analysis with Spark SQL

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics