Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. Airflow uses Directed Acyclic Graph (DAG) to create workflows or tasks. For more information, see the Apache Airflow Documentation page.
This article shows you how to leverage Apache Airflow to orchestrate, schedule, and execute Talend Data Integration (DI) Jobs.
Create two folders named jobs and scripts under the AIRFLOW_HOME folder.
Extract the setup_files.zip, then copy the shell scripts (download_job.sh and delete_job.sh) to the scripts folder.
Copy the talend_job_dag_template.py file from the setup_files.zip to your local machine and update the following:
Also, update the default_args dictionary based on your requirements.
For more information, see the Apache Airflow documentation: Default Arguments.
The DAG template provided is programmed to trigger the task externally. If you plan to schedule the task, update the schedule_interval parameter under the DAG for airflow task with values based on your scheduling requirements.
For more information on values, see the Apache Airflow documentation: DAG Runs.
After the Airflow scheduler picks up the DAG file, a compiled file with the same name and with a .pyc extension is created.
Refresh the Airflow UI screen to see the DAG.
Note: If the DAG is not visible on the User Interface under the DAGs tab, restart the Airflow webserver and the Airflow scheduler.
In this article, you learned how to author, schedule, and monitor workflows from the Airflow UI, and how to download and trigger Talend Jobs for execution.