Jump to content

Apache Airflow

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Wkeithvan (talk | contribs) at 05:59, 1 October 2019 (Add additional information and citations). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Apache Airflow
Original author(s)Maxime Beauchemin
Developer(s)Apache Airflow
Initial releaseJune 3, 2015; 9 years ago (2015-06-03)
Stable release
1.10.5 / August 30, 2019; 5 years ago (2019-08-30)
Repository
Written inPython
Operating systemMicrosoft Windows, macOS, Linux
Available inPython
Typeworkflow management platform
LicenseApache License 2.0
Websiteairflow.apache.org

Apache Airflow is an open-source workflow management platform. It started at AirBnB in October 2014[1] as a solution to manage the company's increasing complex workflows. Creating Airflow allowed AirBnB to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface[2][3]. From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top level Apache Software Foundation project in January 2019.

Building on the popularity of Python as the defacto programming language for data, Airflow is written in Python and workflows are created via Python scripts. Airflow is designed under the principle of "configuration as code". While other "configuration as code" workflow platforms exist using markup languages like XML, using Python allows developers to import libraries and classes to help them create their workflows.

Overview

Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. DAGs can be run either on a defined schedule (e.g. hourly or daily) or based on external event triggers (e.g. a file appearing in Hive[4]). Previous DAG-based schedulers like Oozie and Azkaban tended to rely on multiple configuration files and file system trees to create a DAG, whereas in Airflow, DAGs can often be written in one Python file[5].

  1. ^ "Apache Airflow". Apache Airflow. Archived from the original on August 12, 2019. Retrieved September 30, 2019.
  2. ^ Beauchemin, Maxime (June 2, 2015). "Airflow: a workflow management platform". Medium. Archived from the original on August 13, 2019. Retrieved September 30, 2019.
  3. ^ "Airflow". Archived from the original on July 6, 2019. Retrieved September 30, 2019.
  4. ^ Trencseni, Marton (January 16, 2016). "Airflow review". BytePawn. Archived from the original on February 28, 2019. Retrieved October 1, 2019.
  5. ^ "AirflowProposal". Apache Software Foundation. March 28, 2019. Retrieved October 1, 2019.{{cite web}}: CS1 maint: url-status (link)