From the course: Introduction to Spark SQL and DataFrames
Apache Spark SQL and data analysis
From the course: Introduction to Spark SQL and DataFrames
Apache Spark SQL and data analysis
- [Dan] Apache Spark and SQL are both widely used for data analysis and data science. Hello and welcome to this course, Introduction to Spark SQL and DataFrames. In this course, we're going to introduce dataframes, a foundational data structure in Apache Spark, as well as in the widely used Pandas library and Python. You'll learn about basic operations like filtering and aggregating, using both the Dataframe API and SQL. You'll also learn more advanced techniques like joining data, eliminating duplicates, and understanding how to work with null values. We'll also develop techniques for exploratory data analysis, including analyzing time series data, using clustering, and applying linear regression. So join us now to learn about Apache Spark, SQL, and how to do data analysis with the two of them together.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.