Skip to content

Commit

Permalink
pd: skip certain UT and fix paddle ver in in test_cuda.yml (#4439)
Browse files Browse the repository at this point in the history
In two unit tests under `pd/`, paddle.jit.save is called, which leads to
occasional cuda error 709. Before resolving this issue, temporarily mark
these two unittests to be skipped(`pd/test_dp_show.py` and
`pd/test_multikask`).

![image](https://github.com/user-attachments/assets/45af373f-27cf-4c31-915d-d47296426b6b)

![image](https://github.com/user-attachments/assets/e4413b7d-d530-4d9e-a2d2-f3695e12f9e3)


![image](https://github.com/user-attachments/assets/62f4a378-52c1-4e4d-ab23-a9b41d982c97)


Meanwhile, the version of paddlepaddle-gpu in test_cuda.yml has been
fixed.

@njzjz 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
	- Updated test classes to skip execution due to unresolved CUDA errors.
  
- **Tests**
	- Introduced a new test class for multitask models.
	- Added assertions to validate multitask model configurations.
	- Retained cleanup methods in test classes to manage generated files.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
HydrogenSulfate authored Nov 28, 2024
1 parent 037cf3f commit f7e4cdf
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .github/workflows/test_cuda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
- run: |
export PYTORCH_ROOT=$(python -c 'import torch;print(torch.__path__[0])')
export TENSORFLOW_ROOT=$(python -c 'import importlib,pathlib;print(pathlib.Path(importlib.util.find_spec("tensorflow").origin).parent)')
source/install/uv_with_retry.sh pip install --system --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
source/install/uv_with_retry.sh pip install --system --pre https://paddle-whl.bj.bcebos.com/nightly/cu123/paddlepaddle-gpu/paddlepaddle_gpu-3.0.0.dev20241126-cp311-cp311-linux_x86_64.whl
source/install/uv_with_retry.sh pip install --system -v -e .[gpu,test,lmp,cu12,torch,jax] mpi4py
env:
DP_VARIANT: cuda
Expand Down
2 changes: 2 additions & 0 deletions source/tests/pd/test_dp_show.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
)


@unittest.skip("Skip until solving cuda error 709 in jit.save")
class TestSingleTaskModel(unittest.TestCase):
def setUp(self):
input_json = str(Path(__file__).parent / "water/se_atten.json")
Expand Down Expand Up @@ -101,6 +102,7 @@ def tearDown(self):
shutil.rmtree(f)


@unittest.skip("Skip until solving cuda error 709 in jit.save")
class TestMultiTaskModel(unittest.TestCase):
def setUp(self):
input_json = str(Path(__file__).parent / "water/multitask.json")
Expand Down
2 changes: 2 additions & 0 deletions source/tests/pd/test_multitask.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def setUpModule():
multitask_template = json.load(f)


@unittest.skip("Skip until solving cuda error 709 in jit.save")
class MultiTaskTrainTest:
def test_multitask_train(self):
# test multitask training
Expand Down Expand Up @@ -181,6 +182,7 @@ def tearDown(self):
shutil.rmtree(f)


@unittest.skip("Skip until solving cuda error 709 in jit.save")
class TestMultiTaskSeA(unittest.TestCase, MultiTaskTrainTest):
def setUp(self):
multitask_se_e2_a = deepcopy(multitask_template)
Expand Down

0 comments on commit f7e4cdf

Please sign in to comment.