Skip to content

Commit

Permalink
Add auto retries for Captum OSS GitHub Actions
Browse files Browse the repository at this point in the history
Summary:
We frequently see sporadic failures in Captum GitHub actions test workflows, often related to package download, http errors, conda environment setup, etc.

We add auto-retries to automatically retry failed workflows rather than needing to do this manually.

Differential Revision: D64693773
  • Loading branch information
Vivek Miglani authored and facebook-github-bot committed Oct 21, 2024
1 parent ed5daa3 commit 8642e2d
Show file tree
Hide file tree
Showing 5 changed files with 99 additions and 24 deletions.
19 changes: 19 additions & 0 deletions .github/workflows/retry.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Retry Test
on:
workflow_dispatch:
inputs:
run_id:
required: true
jobs:
rerun-on-failure:
permissions: write-all
runs-on: ubuntu-latest
steps:
- name: rerun ${{ inputs.run_id }}
env:
GH_REPO: ${{ github.repository }}
GH_TOKEN: ${{ github.token }}
GH_DEBUG: api
run: |
gh run watch ${{ inputs.run_id }} > /dev/null 2>&1
gh run rerun ${{ inputs.run_id }} --failed
62 changes: 38 additions & 24 deletions .github/workflows/test-conda-cpu.yml
Original file line number Diff line number Diff line change
@@ -1,34 +1,48 @@
name: Unit-tests for Conda install

on:
pull_request:
push:
branches:
- master
pull_request:
push:
branches:
- master

workflow_dispatch:
workflow_dispatch:

env:
CHANNEL: "nightly"
CHANNEL: "nightly"

jobs:
tests:
strategy:
matrix:
python_version: ["3.8", "3.9", "3.10", "3.11"]
fail-fast: false
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
runner: linux.12xlarge
repository: pytorch/captum
script: |
# Set up Environment Variables
export PYTHON_VERSION="${{ matrix.python_version }}"
tests:
strategy:
matrix:
python_version: ["3.8", "3.9", "3.10", "3.11"]
fail-fast: false
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
runner: linux.12xlarge
repository: pytorch/captum
script: |
# Set up Environment Variables
export PYTHON_VERSION="${{ matrix.python_version }}"
# Create Conda Env
conda create -yp ci_env python="${PYTHON_VERSION}"
conda activate /pytorch/captum/ci_env
./scripts/install_via_conda.sh -n
# Create Conda Env
conda create -yp ci_env python="${PYTHON_VERSION}"
conda activate /pytorch/captum/ci_env
./scripts/install_via_conda.sh -n
# Run Tests
python3 -m pytest -ra --cov=. --cov-report term-missing
# Run Tests
python3 -m pytest -ra --cov=. --cov-report term-missing
auto-retry:
name: Auto retry on failure
if: failure() && fromJSON(github.run_attempt) < 2
runs-on: ubuntu-latest
steps:
- name: Start rerun workflow
env:
GH_REPO: ${{ github.repository }}
GH_TOKEN: ${{ github.token }}
GH_DEBUG: api
run: |
gh workflow run retry_build.yml \
-F run_id=${{ github.run_id }}
14 changes: 14 additions & 0 deletions .github/workflows/test-pip-cpu-with-mypy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,17 @@ jobs:
./scripts/run_mypy.sh
# Run Tests
python3 -m pytest -ra --cov=. --cov-report term-missing
auto-retry:
name: Auto retry on failure
if: failure() && fromJSON(github.run_attempt) < 2
runs-on: ubuntu-latest
steps:
- name: Start rerun workflow
env:
GH_REPO: ${{ github.repository }}
GH_TOKEN: ${{ github.token }}
GH_DEBUG: api
run: |
gh workflow run retry_build.yml \
-F run_id=${{ github.run_id }}
14 changes: 14 additions & 0 deletions .github/workflows/test-pip-cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,17 @@ jobs:
./scripts/install_via_pip.sh ${{ matrix.pytorch_args }} ${{ matrix.transformers_args }}
# Run Tests
python3 -m pytest -ra --cov=. --cov-report term-missing
auto-retry:
name: Auto retry on failure
if: failure() && fromJSON(github.run_attempt) < 2
runs-on: ubuntu-latest
steps:
- name: Start rerun workflow
env:
GH_REPO: ${{ github.repository }}
GH_TOKEN: ${{ github.token }}
GH_DEBUG: api
run: |
gh workflow run retry_build.yml \
-F run_id=${{ github.run_id }}
14 changes: 14 additions & 0 deletions .github/workflows/test-pip-gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,17 @@ jobs:
# Run Tests
python3 -m pytest -ra --cov=. --cov-report term-missing
auto-retry:
name: Auto retry on failure
if: failure() && fromJSON(github.run_attempt) < 2
runs-on: ubuntu-latest
steps:
- name: Start rerun workflow
env:
GH_REPO: ${{ github.repository }}
GH_TOKEN: ${{ github.token }}
GH_DEBUG: api
run: |
gh workflow run retry_build.yml \
-F run_id=${{ github.run_id }}

0 comments on commit 8642e2d

Please sign in to comment.