Skip to content

Commit

Permalink
DW: Add KFTO Training tests
Browse files Browse the repository at this point in the history
  • Loading branch information
ChughShilpa authored and sutaakar committed Nov 22, 2024
1 parent 3fb1ef0 commit 1dbd4ef
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ ${DISTRIBUTED_WORKLOADS_RELEASE_ASSETS} https://github.com/opendatahub-io/distr
${RAY_IMAGE_3.11} quay.io/modh/ray@sha256:db667df1bc437a7b0965e8031e905d3ab04b86390d764d120e05ea5a5c18d1b4
${RAY_IMAGE_3.9} quay.io/modh/ray@sha256:0d715f92570a2997381b7cafc0e224cfa25323f18b9545acfd23bc2b71576d06
${FMS_HF_TUNING_IMAGE} quay.io/modh/fms-hf-tuning@sha256:73bcd66500b8637a9db1339f64c3217212ef74700a22a790f78e9a1f26b8b71a
${CUDA_TRAINING_IMAGE} quay.io/modh/training@sha256:b98e373a972ff6f896a9dc054d56920e915675339c02ea7fa123e0f4bbef4d74
${ROCM_TRAINING_IMAGE} quay.io/modh/training@sha256:2efb6efba4ec08e63847d701e3062a5f6ddf51c91af5fbcef6378b9e6520a3bb
${NOTEBOOK_USER_NAME} ${TEST_USER_3.USERNAME}
${NOTEBOOK_USER_PASSWORD} ${TEST_USER_3.PASSWORD}
${KFTO_CORE_BINARY_NAME} kfto
Expand Down Expand Up @@ -151,6 +153,24 @@ Run Training Operator ODH Test
FAIL ${TEST_NAME} failed
END

Run Training Operator KFTO Test
[Documentation] Run Training Operator KFTO Test
[Arguments] ${TEST_NAME} ${TRAINING_IMAGE}
Log To Console "Running test: ${TEST_NAME}"
${result} = Run Process ./${KFTO_CORE_BINARY_NAME} -test.run ${TEST_NAME}
... shell=true
... stderr=STDOUT
... env:CODEFLARE_TEST_TIMEOUT_SHORT=5m
... env:CODEFLARE_TEST_TIMEOUT_MEDIUM=10m
... env:CODEFLARE_TEST_TIMEOUT_LONG=20m
... env:CODEFLARE_TEST_OUTPUT_DIR=%{WORKSPACE}/codeflare-${KFTO_CORE_BINARY_NAME}-logs
... env:CODEFLARE_TEST_TRAINING_IMAGE=${TRAINING_IMAGE}
Log To Console ${result.stdout}
Check missing Go test ${result.stdout}
IF ${result.rc} != 0
FAIL ${TEST_NAME} failed
END

Prepare DistributedWorkloads Integration Test Suite
[Documentation] Prepare DistributedWorkloads Integration Test Suite
Log To Console "Downloading compiled test binary ${ODH_BINARY_NAME}"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
*** Settings ***
Documentation Training operator E2E tests - https://github.com/opendatahub-io/distributed-workloads/tree/main/tests/kfto/core
Suite Setup Prepare Training Operator E2E Core Test Suite
Suite Teardown Teardown Training Operator E2E Core Test Suite
Library OperatingSystem
Library Process
Resource ../../../../tasks/Resources/RHODS_OLM/install/oc_install.robot
Resource ../../../../tests/Resources/Page/DistributedWorkloads/DistributedWorkloads.resource


*** Test Cases ***
Run Training operator KFTO test with NVIDIA CUDA image
[Documentation] Run Go KFTO tests for Training operator using PyTorch job with NVIDIA CUDA image
[Tags] Resources-GPU NVIDIA-GPUs
... RHOAIENG-16035
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO test with AMD ROCm image
[Documentation] Run Go KFTO tests for Training operator using PyTorch job with AMD ROCm image
[Tags] Resources-GPU AMD-GPUs ROCm
... RHOAIENG-16035
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobWithROCm ${ROCM_TRAINING_IMAGE}

Run Training operator KFTO error handling test with NVIDIA CUDA image
[Documentation] Run Go KFTO error handling tests for Training operator using PyTorch job with NVIDIA CUDA image
[Tags] RHOAIENG-14542
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobFailureWithCuda ${CUDA_TRAINING_IMAGE}

Run Training operator KFTO error handling test with AMD ROCm image
[Documentation] Run Go KFTO error handling tests for Training operator using PyTorch job with AMD ROCm image
[Tags] RHOAIENG-14542
... Tier1
... DistributedWorkloads
... Training
... TrainingOperator
Run Training Operator KFTO Test TestPyTorchJobFailureWithROCm ${ROCM_TRAINING_IMAGE}

0 comments on commit 1dbd4ef

Please sign in to comment.