Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ci): transfer all GPU CI to hyperstack #1460

Merged
merged 1 commit into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Compile and test tfhe-cuda-backend on an H100 VM on hyperstack
name: TFHE Cuda Backend - Base tests on H100
name: TFHE Cuda Backend - Fast tests on H100

env:
CARGO_TERM_COLOR: always
Expand Down Expand Up @@ -49,7 +49,7 @@ jobs:
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/hyperstack**'
- '.github/workflows/gpu_fast_h100_tests.yml'
- scripts/**
- ci/**

Expand Down Expand Up @@ -109,6 +109,8 @@ jobs:

- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'

- name: Set up home
run: |
Expand Down Expand Up @@ -170,7 +172,7 @@ jobs:
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Base H100 tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"
SLACK_MESSAGE: "Fast H100 tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"

teardown-instance:
name: Teardown instance (cuda-h100-tests)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ jobs:
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- '.github/workflows/gpu_fast_tests.yml'
- Makefile
- scripts/**
- ci/**

setup-instance:
name: Setup instance (cuda-tests)
Expand All @@ -65,7 +69,7 @@ jobs:
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
backend: hyperstack
profile: gpu-test

cuda-tests-linux:
Expand All @@ -84,11 +88,23 @@ jobs:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}

CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install

- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
Expand Down Expand Up @@ -122,6 +138,10 @@ jobs:
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"

- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi

- name: Run core crypto and internal CUDA backend tests
run: |
make test_core_crypto_gpu
Expand All @@ -139,13 +159,18 @@ jobs:
run: |
make test_high_level_api_gpu

- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS tests finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Base GPU tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"

teardown-instance:
name: Teardown instance (cuda-tests)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/aws_tfhe_multi_gpu**'
- '.github/workflows/**_multi_gpu_tests.yml'
agnesLeroy marked this conversation as resolved.
Show resolved Hide resolved
- scripts/**
- ci/**

Expand All @@ -71,7 +71,7 @@ jobs:
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
backend: hyperstack
profile: multi-gpu-test

cuda-tests-linux:
Expand All @@ -90,13 +90,27 @@ jobs:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}

CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install

- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'

- name: Set up home
run: |
Expand Down Expand Up @@ -126,30 +140,39 @@ jobs:
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"

- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi

# No need to test core_crypto and classic PBS in integer since it's already tested on single GPU.
- name: Run multi-bit CUDA integer tests
run: |
make test_integer_multi_bit_gpu_ci
BIG_TESTS_INSTANCE=TRUE make test_integer_multi_bit_gpu_ci
agnesLeroy marked this conversation as resolved.
Show resolved Hide resolved

- name: Run user docs tests
run: |
make test_user_doc_gpu
BIG_TESTS_INSTANCE=TRUE make test_user_doc_gpu

- name: Test C API
run: |
make test_c_api_gpu
BIG_TESTS_INSTANCE=TRUE make test_c_api_gpu

- name: Run High Level API Tests
run: |
make test_high_level_api_gpu
BIG_TESTS_INSTANCE=TRUE make test_high_level_api_gpu

- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS multi-GPU tests finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Multi-GPU tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"

teardown-instance:
name: Teardown instance (cuda-tests-multi-gpu)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/hyperstack**'
- '.github/workflows/gpu_signed_integer_h100_tests.yml'
- scripts/**
- ci/**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ jobs:
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- '.github/workflows/gpu_signed_integer_tests.yml'
- Makefile
- scripts/**
- ci/**

setup-instance:
name: Setup instance (cuda-signed-integer-tests)
Expand All @@ -75,7 +79,7 @@ jobs:
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
backend: hyperstack
profile: gpu-test

cuda-signed-integer-tests:
Expand All @@ -94,13 +98,27 @@ jobs:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}

CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install

- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'

- name: Set up home
run: |
Expand Down Expand Up @@ -138,17 +156,26 @@ jobs:
echo "NIGHTLY_TESTS=TRUE";
} >> "${GITHUB_ENV}"

- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi

- name: Run signed integer multi-bit tests
run: |
make test_signed_integer_multi_bit_gpu_ci

- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-signed-integer-tests ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-signed-integer-tests.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS signed integer tests finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_COLOR: ${{ needs.cuda-signed-integer-tests.result }}
SLACK_MESSAGE: "Base GPU tests finished with status: ${{ needs.cuda-signed-integer-tests.result }}. (${{ env.ACTION_RUN_URL }})"

teardown-instance:
name: Teardown instance (cuda-tests)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/hyperstack**'
- '.github/workflows/gpu_unsigned_integer_tests.yml'
- scripts/**
- ci/**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ jobs:
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- '.github/workflows/gpu_unsigned_integer_tests.yml'
- Makefile
- scripts/**
- ci/**

setup-instance:
name: Setup instance (cuda-unsigned-integer-tests)
Expand All @@ -74,7 +78,7 @@ jobs:
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
backend: hyperstack
profile: gpu-test

cuda-unsigned-integer-tests:
Expand All @@ -93,11 +97,23 @@ jobs:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}

CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install

- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332

Expand Down Expand Up @@ -137,17 +153,26 @@ jobs:
echo "NIGHTLY_TESTS=TRUE";
} >> "${GITHUB_ENV}"

- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi

- name: Run unsigned integer multi-bit tests
run: |
make test_unsigned_integer_multi_bit_gpu_ci

- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-unsigned-integer-tests ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-unsigned-integer-tests.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS unsigned integer tests finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_COLOR: ${{ needs.cuda-unsigned-integer-tests.result }}
SLACK_MESSAGE: "Unsigned integer GPU tests finished with status: ${{ needs.cuda-unsigned-integer-tests.result }}. (${{ env.ACTION_RUN_URL }})"

teardown-instance:
name: Teardown instance (cuda-tests)
Expand Down
Loading
Loading