From the course: Complete Guide to Data Lakes and Lakehouses

Unlock the full course today

Join today to access over 25,200 courses taught by industry experts.

SQL on Hadoop: Hive and Impala

SQL on Hadoop: Hive and Impala

- [Instructor] Hive and Impala are two technologies that meet a specific need by enabling high performance SQL querying directly on data stored in Hadoop clusters. Even though these technologies may not be state of the art and are actively being replaced by newer cloud solutions, which we will discuss later, it is important to mention them given they could still be used in legacy Hadoop data lakes. Developed by Facebook and later on open-sourced Apache Hive is designed to provide a SQL-like interface for querying data stored in the Hadoop Distributed File System, HDFS. It is ideal for data warehousing applications with its schema on read and table-like abstraction. These are the features that make Hive special. It is particularly well suited for long running batch processing jobs that requires complex SQL queries over large dataset. Hive Query Language, HiveQL translates SQL-like queries into MapReduce, Tez, or Spark jobs, allowing you to execute SQL commands to manipulate and…

Contents