👶 Easy Setup: Using cookiecutter to fill in the blanks.
🔥 Disposable Infrastructure: Using helm and some premade commands, we can destroy and re-deploy the entire infrastructure easily.
🚀 Cost-Efficient: We use kubernetes as the tasks' engine. Airflow scheduler will run each task on a new pod and delete it upon completion. Allowing us to scale according to workload using the minimal amount of resources.
🔩 Decoupled Executor: Another great advantage of using Kubernetes as the task runner is - decoupling orchestration from execution. You can read more about it in We're All Using Airflow Wrong and How to Fix It.
🏃 Dynamically Updated Workflows: We use Git-Sync containers. Those will allow us to update the workflows using git alone. No need to redeploy Airflow on each workflow change.
$ cookiecutter https://github.com/talperetz/scalable-airflow-template
- airflow_executor: You can use Kubernetes for execution with both Celery and Kubernetes as executors. To learn more checkout Scale Your Data Pipelines with Airflow and Kubernetes
- local_airflow_image_name: image name. required if you want to build your own Airflow image.
- airflow_image_repository: ECR repository link. required if you want to build your own Airflow image.
- git_repo_to_sync_dags: link to the scalable_airflow repository with your new workflows on github.
- git_username_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_username" | base64
- git_password_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_password" | base64
- fernet_key: You can fill fernet_key option with the response from this command:
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
$ brew install kubectl
$ brew install helm
- make sure your kubectl context is configured to your EKS cluster.
for custom Airflow image you'll also need:
Kubernetes cluster set with autoscaler
ECR Repository for the docker image
It is also recommended to set up Kubernetes Dashboard
$ make deploy
At this point you should see the stack deployed to kubernetes.
To see Airflow's UI:
$ make ui pod=[webserver-pod-name]
After changing the config/docker/Dockerfile and scripts/entrypoint.sh
Build your custom airflow image
$ make build
Push to ECR
$ make push
Deploy to Kubernetes
$ make deploy
To see Airflow's UI:
$ make ui pod=[webserver-pod-name]
This template uses:
Airflow Helm Chart: Airflow stable helm chart
Docker Image: https://github.com/puckel/docker-airflow
for more details and fine tuning of the setup please refer to the links above.