From the course: Data Engineering Foundations
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
MapReduce and Hadoop
From the course: Data Engineering Foundations
MapReduce and Hadoop
- [Instructor] It's time to talk about specific parallel computing frameworks. We'll focus on frameworks that are currently hot in the data engineering world. When it comes to big data systems, Hadoop is the most popular and used frameworks. And MapReduce was one of the most popular processing techniques. So, what is Hadoop? It is the ecosystem of open-source tools that has changed the way enterprises store, process, and analyze data. It's a collection of open-source projects that is maintained by the Apache Software Foundation. Some of them are a bit outdated, but it is still relevant to talk about them. It uses the MapReduce algorithm. A Hadoop plays a central role in developing ETL pipelines, where ETL stands for Extract, Transform, and Load. There are two Hadoop projects we want to focus on in this particular video; MapReduce and HDFS. So let's first talk about HDFS. It is a distributed file system. It is…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.