From the course: Cloud Hadoop: Scaling Apache Spark

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Scale Spark on the cloud by example

Scale Spark on the cloud by example - Apache Spark Tutorial

From the course: Cloud Hadoop: Scaling Apache Spark

Scale Spark on the cloud by example

- [Instructor] In this section, I'm going to take you through some work that my team did in collaboration with C-S-I-R-O bioinformatics in Sydney Australia on moving to the cloud and scaling real-world Spark workload. The use cases for genomic analysis or bioinformatics research and there are several constraints for our customer here. They were researched focused, the didn't at the time we started to have a dedicated devops or cloud person. And they really wanted to make their solution flexible to work across any cloud. So, as starting point they had written a library called VariantSpark which runs on top of Spark and implements custom machine learning. We'll look at it a little bit more detail in a minute. They recorded it in Scala and they had open sourced it on GitHub. When we first stared working together they were using it internally. They were using it on a shared Hadoop Spark cluster and their frustration…

Contents