Skip to content

Latest commit

 

History

History
14 lines (14 loc) · 3.01 KB

README_Data Pipeline ETL Frameworks.md

File metadata and controls

14 lines (14 loc) · 3.01 KB

Data Pipeline ETL Frameworks

  • Apache Airflow - Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
  • Apache Nifi - Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic.
  • Argo Workflows - Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
  • Azkaban - Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
  • Chronos - More of a job scheduler for Mesos than ETL pipeline. [OUTDATED]
  • Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
  • Gokart - Wrapper of the data pipeline Luigi
  • Luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
  • Metaflow - A framework for data scientists to easily build and manage real-life data science projects.
  • Neuraxle - A framework for building neat pipelines, providing the right abstractions to chain your data transformation and prediction steps with data streaming, as well as doing hyperparameter searches (AutoML).
  • Oozie - Workflow scheduler for Hadoop jobs
  • PipelineX - Based on Kedro and MLflow. Full comparison given at https://github.com/Minyus/Python_Packages_for_Pipeline_Workflow
  • Prefect Core - Workflow management system that makes it easy to take your data pipelines and add semantics like retries, logging, dynamic mapping, caching, failure notifications, and more.#