Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix GPU test OOM problem #3207

Merged
merged 3 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/test_cuda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,17 @@ jobs:
&& sudo apt-get -y install cuda-12-2 libcudnn8=8.9.5.*-1+cuda12.2
if: false # skip as we use nvidia image
- name: Set PyPI mirror for Aliyun cloud machine
run: python -m pip config --user set global.index-url https://mirrors.aliyun.com/pypi/simple/
run: python -m pip config --user set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/
- run: python -m pip install -U "pip>=21.3.1,!=23.0.0"
- run: python -m pip install "tensorflow>=2.15.0rc0"
- run: python -m pip install "tensorflow>=2.15.0rc0" "torch>=2.2.0"
- run: python -m pip install -v -e .[gpu,test,lmp,cu12,torch] "ase @ https://gitlab.com/ase/ase/-/archive/8c5aa5fd6448c5cfb517a014dccf2b214a9dfa8f/ase-8c5aa5fd6448c5cfb517a014dccf2b214a9dfa8f.tar.gz"
env:
DP_BUILD_TESTING: 1
DP_VARIANT: cuda
CUDA_PATH: /usr/local/cuda-12.2
NUM_WORKERS: 0
- run: dp --version
- run: python -m pytest -s --cov=deepmd source/tests --durations=0
- run: python -m pytest --cov=deepmd source/tests --durations=0
- run: source/install/test_cc_local.sh
env:
OMP_NUM_THREADS: 1
Expand All @@ -58,8 +58,8 @@ jobs:
- run: |
export LD_LIBRARY_PATH=$GITHUB_WORKSPACE/dp_test/lib:$CUDA_PATH/lib64:$LD_LIBRARY_PATH
export PATH=$GITHUB_WORKSPACE/dp_test/bin:$PATH
python -m pytest -s --cov=deepmd source/lmp/tests
python -m pytest -s --cov=deepmd source/ipi/tests
python -m pytest --cov=deepmd source/lmp/tests
python -m pytest --cov=deepmd source/ipi/tests
env:
OMP_NUM_THREADS: 1
TF_INTRA_OP_PARALLELISM_THREADS: 1
Expand Down
3 changes: 3 additions & 0 deletions deepmd/tf/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,9 @@ def _get_package_constants(

op_module = get_module("deepmd_op")
op_grads_module = get_module("op_grads")
# prevent OOM when using with other backends
# tf.config doesn't work for unclear reason
set_env_if_empty("TF_FORCE_GPU_ALLOW_GROWTH", "true", verbose=False)

# FLOAT_PREC
GLOBAL_TF_FLOAT_PRECISION = tf.dtypes.as_dtype(GLOBAL_NP_FLOAT_PRECISION)
Expand Down
9 changes: 9 additions & 0 deletions source/tests/pt/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# SPDX-License-Identifier: LGPL-3.0-or-later
import pytest
import torch


@pytest.fixture(scope="package", autouse=True)
def clear_cuda_memory(request):
yield
torch.cuda.empty_cache()
Loading