From the course: Cloud Hadoop: Scaling Apache Spark
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Run Spark job on AWS EMR - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Run Spark job on AWS EMR
- [Lynn] Now, when you're verifying that your Spark cluster is working properly there's two types of basic "Hello, World" jobs and one is just a distributed calculation, there's no data involved, just basically checks to make sure all your nodes are working and everything, and that one is calculating the digits of pi, and that's the one that we're going to use to test this cluster out. The other one is word count, where you pull in some text, I'm using Shakespeare plays, basically, and then you're testing the ingest of this text file and then the distributed processing, in our case counting the words, and then you have an output. So, as I mentioned in a previous movie, the examples for all this stuff is on my GitHub. So if you go into the "Use Spark" and in this case we're going to be using PySpark And you can see that here is our example. So you can just copy and paste it. And so here, if we want to go to our notebook,…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
(Locked)
Sign up for Databricks Community Edition3m 29s
-
Add Hadoop libraries2m 33s
-
(Locked)
Databricks AWS Community Edition2m 22s
-
(Locked)
Load data into tables1m 51s
-
Hadoop and Spark cluster on AWS EMR7m 30s
-
(Locked)
Run Spark job on AWS EMR4m 40s
-
(Locked)
Review batch architecture for ETL on AWS2m 17s
-
(Locked)
-
-
-
-
-
-