You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing issues with our Airflow deployment where DAGs fail, and the pod enters an error state when GitHub is down or unreachable. This disruption significantly impacts our workflows since we rely on git-sync to fetch the latest DAGs from our GitHub repository. Here are the relevant settings from our Helm chart:
Issue: When GitHub is unavailable, the git-sync sidecar fails to pull the latest DAGs, leading to the following problems:
Airflow pods go into an error state.
Existing DAGs stop executing even if they were previously synced and valid.
Question: Is there a recommended way to mitigate the impact of GitHub downtime on the Airflow pods? We would like to ensure the following:
The Airflow DAGs continue to run based on the last successfully synced state, even if new sync attempts fail.
Reduce or eliminate the pod errors caused by temporary GitHub outages.
Current Workaround: We have set maxFailures: 0, which allows the git-sync sidecar to keep retrying indefinitely. However, this does not prevent the Airflow pod itself from failing or going into an error state.
Request for Advice: Could you please advise on the best practices for:
Improving the resilience of the git-sync process against GitHub outages.
Configurations or strategies to keep Airflow DAGs functional with the last synced state, even if new syncs fail.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered:
Hello,
We are experiencing issues with our Airflow deployment where DAGs fail, and the pod enters an error state when GitHub is down or unreachable. This disruption significantly impacts our workflows since we rely on git-sync to fetch the latest DAGs from our GitHub repository. Here are the relevant settings from our Helm chart:
Issue: When GitHub is unavailable, the git-sync sidecar fails to pull the latest DAGs, leading to the following problems:
Question: Is there a recommended way to mitigate the impact of GitHub downtime on the Airflow pods? We would like to ensure the following:
Current Workaround: We have set maxFailures: 0, which allows the git-sync sidecar to keep retrying indefinitely. However, this does not prevent the Airflow pod itself from failing or going into an error state.
Request for Advice: Could you please advise on the best practices for:
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: