From the course: Cloud Hadoop: Scaling Apache Spark
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Scale Spark on the cloud by example - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Scale Spark on the cloud by example
- [Instructor] In this section, I'm going to take you through some work that my team did in collaboration with C-S-I-R-O bioinformatics in Sydney Australia on moving to the cloud and scaling real-world Spark workload. The use cases for genomic analysis or bioinformatics research and there are several constraints for our customer here. They were researched focused, the didn't at the time we started to have a dedicated devops or cloud person. And they really wanted to make their solution flexible to work across any cloud. So, as starting point they had written a library called VariantSpark which runs on top of Spark and implements custom machine learning. We'll look at it a little bit more detail in a minute. They recorded it in Scala and they had open sourced it on GitHub. When we first stared working together they were using it internally. They were using it on a shared Hadoop Spark cluster and their frustration…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Scale Spark on the cloud by example5m 11s
-
(Locked)
Build a quick start with Databricks AWS6m 50s
-
(Locked)
Scale Spark cloud compute with VMs6m 16s
-
(Locked)
Optimize cloud Spark virtual machines6m 5s
-
(Locked)
Use AWS EKS containers and data lake7m 8s
-
(Locked)
Optimize Spark cloud data tiers on Kubernetes4m 17s
-
(Locked)
Build reproducible cloud infrastructure8m 37s
-
(Locked)
Scale on GCP Dataproc or on Terra.bio8m 34s
-
(Locked)
Serverless Spark with Dataproc Notebook5m 25s
-
(Locked)
-