Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling GitHub Downtime Impact on Airflow DAGs with git-sync #25

Open
sebinxavi opened this issue Sep 26, 2024 · 0 comments
Open

Handling GitHub Downtime Impact on Airflow DAGs with git-sync #25

sebinxavi opened this issue Sep 26, 2024 · 0 comments

Comments

@sebinxavi
Copy link

sebinxavi commented Sep 26, 2024

Hello,

We are experiencing issues with our Airflow deployment where DAGs fail, and the pod enters an error state when GitHub is down or unreachable. This disruption significantly impacts our workflows since we rely on git-sync to fetch the latest DAGs from our GitHub repository. Here are the relevant settings from our Helm chart:

dags:
  gitSync:
    branch: main
    depth: 1
    enabled: true
    image:
      gid: 65533
      pullPolicy: IfNotPresent
      repository: <>/git-sync/git-sync
      tag: v3.2.2
      uid: 65533
    maxFailures: 0
    repo: [email protected]:<>/<>.git
    repoSubPath: ""
    resources: {}
    revision: HEAD
    sshKnownHosts: ""
    sshSecret: airflow-git-ssh-secrets
    sshSecretKey: id_rsa
    syncTimeout: 120
    syncWait: 60
  path: /opt/airflow/dags
  persistence:
    accessMode: ReadOnlyMany
    enabled: false
    existingClaim: ""
    size: 1Gi
    storageClass: ""
    subPath: ""

Issue: When GitHub is unavailable, the git-sync sidecar fails to pull the latest DAGs, leading to the following problems:

  • Airflow pods go into an error state.
  • Existing DAGs stop executing even if they were previously synced and valid.

Question: Is there a recommended way to mitigate the impact of GitHub downtime on the Airflow pods? We would like to ensure the following:

  • The Airflow DAGs continue to run based on the last successfully synced state, even if new sync attempts fail.
  • Reduce or eliminate the pod errors caused by temporary GitHub outages.

Current Workaround: We have set maxFailures: 0, which allows the git-sync sidecar to keep retrying indefinitely. However, this does not prevent the Airflow pod itself from failing or going into an error state.

Request for Advice: Could you please advise on the best practices for:

  • Improving the resilience of the git-sync process against GitHub outages.
  • Configurations or strategies to keep Airflow DAGs functional with the last synced state, even if new syncs fail.

Thank you for your assistance!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant