From the course: Introduction to Spark SQL and DataFrames
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Querying DataFrames with SQL
From the course: Introduction to Spark SQL and DataFrames
Querying DataFrames with SQL
- [Instructor] Up to now, we've been using the Spark DataFrame API to work with DataFrames. Now, we're going to switch gears and we're going to work with SQL. In particular, we're going to use Spark SQL for working with DataFrames. As in previous videos, I'm started with data already loaded. Let's just quickly go through the steps that are involved with that. First, I import a pyspark library that allows us to work with SQL. I create a Spark session global variable which allows us to work with a distributed Spark session. Then I've defined a string that points to my directory which holds my data. And then, I create another string which points to the file I want to load, and then I execute a Spark read command specifying the JSON format. And then finally, I've listed out the first 10 rows of this DataFrame, which I called df. Let me briefly explain some of the columns. In this DataFrame, we have utilization data about…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.