From the course: Cloud Hadoop: Scaling Apache Spark

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Run Spark job on AWS EMR

Run Spark job on AWS EMR

- [Lynn] Now, when you're verifying that your Spark cluster is working properly there's two types of basic "Hello, World" jobs and one is just a distributed calculation, there's no data involved, just basically checks to make sure all your nodes are working and everything, and that one is calculating the digits of pi, and that's the one that we're going to use to test this cluster out. The other one is word count, where you pull in some text, I'm using Shakespeare plays, basically, and then you're testing the ingest of this text file and then the distributed processing, in our case counting the words, and then you have an output. So, as I mentioned in a previous movie, the examples for all this stuff is on my GitHub. So if you go into the "Use Spark" and in this case we're going to be using PySpark And you can see that here is our example. So you can just copy and paste it. And so here, if we want to go to our notebook,…

Contents