From the course: Complete Guide to Data Lakes and Lakehouses
Unlock the full course today
Join today to access over 25,200 courses taught by industry experts.
SQL on Hadoop: Hive and Impala
From the course: Complete Guide to Data Lakes and Lakehouses
SQL on Hadoop: Hive and Impala
- [Instructor] Hive and Impala are two technologies that meet a specific need by enabling high performance SQL querying directly on data stored in Hadoop clusters. Even though these technologies may not be state of the art and are actively being replaced by newer cloud solutions, which we will discuss later, it is important to mention them given they could still be used in legacy Hadoop data lakes. Developed by Facebook and later on open-sourced Apache Hive is designed to provide a SQL-like interface for querying data stored in the Hadoop Distributed File System, HDFS. It is ideal for data warehousing applications with its schema on read and table-like abstraction. These are the features that make Hive special. It is particularly well suited for long running batch processing jobs that requires complex SQL queries over large dataset. Hive Query Language, HiveQL translates SQL-like queries into MapReduce, Tez, or Spark jobs, allowing you to execute SQL commands to manipulate and…
Contents
-
-
-
-
-
-
-
-
Introduction to data consumption4m 59s
-
(Locked)
Unified data analysis: Spark4m 17s
-
(Locked)
SQL on Hadoop: Hive and Impala3m 19s
-
(Locked)
Interactive query engines: Presto and Trino3m 18s
-
(Locked)
Data indexing4m 12s
-
(Locked)
Optimizing query performance6m 12s
-
(Locked)
Data consumption security considerations3m 47s
-
-
-
-
-
-