From the course: Data Engineering Foundations

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Scheduling ETL pipeline using Airflow

Scheduling ETL pipeline using Airflow

From the course: Data Engineering Foundations

Scheduling ETL pipeline using Airflow

- [Instructor] It's time to put everything together and schedule the jobs that we have defined so far. As discussed, we'll have to use a scheduling tool for this, and the most commonly used scheduling tool is Apache Airflow. We'll be using the code written so far, and then we'll define a directed acyclic graph using Apache Airflow. So first of all, let's set up Airflow on our machine. So, I am first creating a directory called Airflow, where all the configuration file and database would reside. So mkdir airflow, that's the command. Now, the next step is to set the AIRFLOW_HOME variable. So AIRFLOW_HOME, export the path to this variable that is actually required by the Airflow configuration file. Chapter_4, and the airflow directory. This is the home that we have set. Now, the next step is to actually install Apache Airflow. So sudo pip install apache-airflow. Hit Enter. Now, this is going to take a few seconds, close to…

Contents