From the course: Scala Essential Training for Data Science (2017)

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Joining DataFrames

Joining DataFrames

- [Instructor] Now another common operation is joining tables. Now with Spark SQL we can join DataFrames. So I'm going to pick up where I left off in the previous lesson with my Scala REPL active here. I'm going to just clear the screen. And just as a refresher I'm going to show the contents of a DataFrame called emps. And there's the first 20 rows of emps. I also have another DataFrame that deals with countries and regions. And that's called df_cr. And let's show the contents of those. Now you'll notice both the employees table and the country region table have a column called region ID. That means I can join these two DataFrames. Joining is really simple in Spark SQL. Let's clear the screen. And let's take a quick look at how to do that. Let's create a new DataFrame. We'll it a DataFrame, we'll simply call it joined. And it's going to be a join of the DataFrame called emps for df emps. And I'm going to apply the join operation to it. And I'm going to tell Spark to join emps to the…

Contents