耦合amgx编译问题 #484

yanchong13 · 2024-06-23T00:27:55Z

采用wsl-ubuntu 20.04子系统编译OpenFOAM+Deepflame，尝试耦合AMGX加速，步骤如下：

#1 in windows11
install wsl ubuntu-20.04
install nvidia driver

#2 in wsl

#cuda
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

#cudnn
tar xvJf cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz
cp /lib/* /usr/local/cuda-11.8/lib64
cp /include/* /usr/local/cuda-11.8/include

#environment

vim ~/.bashrc

export PATH=/usr/bin:$PATH
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

#cmake
sudo apt-get install libssl-dev
wget https://cmake.org/files/v3.29/cmake-3.29.6.tar.gz
tar -xvzf cmake-3.29.6.tar.gz
chmod 777 ./configure
./configure
make
sudo make install
sudo update-alternatives --install /usr/bin/cmake cmake /usr/local/bin/cmake 1 --force

#AMGX
git clone --recursive https://github.com/NVIDIA/AMGX.git
cd AMGX
mkdir build
cd build
cmake ../
make -j16 all

#miniconda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
conda config --set auto_activate_base false

#OpenFOAM-7
sudo apt-get install build-essential git ca-certificates
sudo apt-get install flex libfl-dev bison zlib1g-dev libboost-system-dev libboost-thread-dev libopenmpi-dev openmpi-bin gnuplot libreadline-dev libncurses-dev libxt-dev
wget -O - http://dl.openfoam.org/source/7 | tar xz
wget -O - http://dl.openfoam.org/third-party/7 | tar xz
mv OpenFOAM-7-version-7 OpenFOAM-7
mv ThirdParty-7-version-7 ThirdParty-7
source OpenFOAM-7/etc/bashrc
./OpenFOAM-7/Allwmake -j

conda create -n deepflame python=3.8
conda activate deepflame
conda install -c cantera libcantera-devel=2.6 cantera
#pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pybind11 pkg-config
conda install fmt
conda install eigen

#combine

. configure.sh --amgx_dir /home/yanchong/AMGX/
. configure.sh --use_pytorch

#get bashrc

export DF_ROOT=/home/yanchong/OpenFOAM/deepflame-dev
export DF_SRC=/home/yanchong/OpenFOAM/deepflame-dev/src
export SRC_ORIG=/home/yanchong/OpenFOAM/deepflame-dev/src_orig
export LIBTORCH_ROOT=
export PYTHON_INC_DIR="-I/root/miniconda3/envs/deepflame/include/python3.8 -I/root/miniconda3/envs/deepflame/lib/python3.8/site-packages/pybind11/include"
export PYTHON_LIB_DIR="-L/root/miniconda3/envs/deepflame/lib -lpython3.8"
export CANTERA_ROOT=/root/miniconda3/envs/deepflame
export CANTERA_DATA=$CANTERA_ROOT/share/cantera/data
export LD_LIBRARY_PATH=$LIBTORCH_ROOT/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CANTERA_ROOT/lib:$LD_LIBRARY_PATH
export AMGX_DIR=/home/yanchong/AMGX/
export ODE_GPU_SOLVER=

export DF_APPBIN=/home/yanchong/OpenFOAM/deepflame-dev/platforms/$WM_OPTIONS/bin
export DF_LIBBIN=/home/yanchong/OpenFOAM/deepflame-dev/platforms/$WM_OPTIONS/lib
export PATH=$DF_APPBIN:$PATH
export LD_LIBRARY_PATH=$DF_LIBBIN:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$DF_ROOT/src_gpu/build:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$AMGX_DIR/build:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$DF_ROOT/src/dfChemistryModel/DNNInferencer/build:$LD_LIBRARY_PATH

source bashrc
. install.sh
#get errors
g++: error: /usr/local/cuda/lib64/libnccl.so: No such file or directory
g++: error: /home/yanchong/OpenFOAM/deepflame-dev/src_gpu/build/libdfMatrix.so: No such file or directory

不采用--amgx_dir AMGX则可以编译成功。
请问是否还需要安装nccl包吗？

yanchong13 · 2024-06-24T13:20:37Z

After installation of nccl, error "g++: error: /usr/local/cuda/lib64/libnccl.so: No such file or directory" disappeared.
But in Dir src_gpu:
cd $DF_ROOT/src_gpu
mkdir build
cd build
cmake ..

error still occurred:
CMake Warning (dev) at CMakeLists.txt:11 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

CMake Error at CMakeLists.txt:14 (find_package):
By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Torch", but
CMake did not find one.

Could not find a package configuration file provided by "Torch" with any of
the following names:

TorchConfig.cmake
torch-config.cmake

Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
"Torch_DIR" to a directory containing one of the above files. If "Torch"
provides a separate development package or SDK, be sure it has been
installed.

yanchong13 · 2024-06-24T13:22:15Z

Is my cmake-3.29.6 version not available? or other problems?

yanchong13 · 2024-06-25T06:28:09Z

Soved by adding "export Torch_DIR=/home/yanchong/anaconda3/envs/deepflame/lib/python3.8/site-packages/torch" to ~/.bashrc

yanchong13 closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

耦合amgx编译问题 #484

耦合amgx编译问题 #484

yanchong13 commented Jun 23, 2024

yanchong13 commented Jun 24, 2024

yanchong13 commented Jun 24, 2024

yanchong13 commented Jun 25, 2024

耦合amgx编译问题 #484

耦合amgx编译问题 #484

Comments

yanchong13 commented Jun 23, 2024

yanchong13 commented Jun 24, 2024

yanchong13 commented Jun 24, 2024

yanchong13 commented Jun 25, 2024