From the course: Introduction to Spark SQL and DataFrames

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Querying DataFrames with SQL

Querying DataFrames with SQL

- [Instructor] Up to now, we've been using the Spark DataFrame API to work with DataFrames. Now, we're going to switch gears and we're going to work with SQL. In particular, we're going to use Spark SQL for working with DataFrames. As in previous videos, I'm started with data already loaded. Let's just quickly go through the steps that are involved with that. First, I import a pyspark library that allows us to work with SQL. I create a Spark session global variable which allows us to work with a distributed Spark session. Then I've defined a string that points to my directory which holds my data. And then, I create another string which points to the file I want to load, and then I execute a Spark read command specifying the JSON format. And then finally, I've listed out the first 10 rows of this DataFrame, which I called df. Let me briefly explain some of the columns. In this DataFrame, we have utilization data about…

Contents