Skip to content

Commit

Permalink
move horovod installation to installation part
Browse files Browse the repository at this point in the history
as now easy installation also contains horovod, so we may want to discuss separately
  • Loading branch information
njzjz committed Sep 14, 2021
1 parent b032878 commit 7b2df5d
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 16 deletions.
6 changes: 3 additions & 3 deletions doc/install/easy-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

There various easy methods to install DeePMD-kit. Choose one that you prefer. If you want to build by yourself, jump to the next two sections.

After your easy installation, DeePMD-kit (`dp`) and LAMMPS (`lmp`) will be available to execute. You can try `dp -h` and `lmp -h` to see the help. `mpirun` is also available considering you may want to run LAMMPS in parallel.
After your easy installation, DeePMD-kit (`dp`) and LAMMPS (`lmp`) will be available to execute. You can try `dp -h` and `lmp -h` to see the help. `mpirun` is also available considering you may want to train models or run LAMMPS in parallel.

- [Install off-line packages](#install-off-line-packages)
- [Install with conda](#install-with-conda)
Expand All @@ -27,13 +27,13 @@ conda create -n deepmd deepmd-kit=*=*cpu libdeepmd=*=*cpu lammps-dp -c https://c

Or one may want to create a GPU environment containing [CUDA Toolkit](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver):
```bash
conda create -n deepmd deepmd-kit=*=*gpu libdeepmd=*=*gpu lammps-dp cudatoolkit=11.3 -c https://conda.deepmodeling.org
conda create -n deepmd deepmd-kit=*=*gpu libdeepmd=*=*gpu lammps-dp cudatoolkit=11.3 horovod -c https://conda.deepmodeling.org
```
One could change the CUDA Toolkit version from `10.1` or `11.3`.

One may speficy the DeePMD-kit version such as `2.0.0` using
```bash
conda create -n deepmd deepmd-kit=2.0.0=*cpu libdeepmd=2.0.0=*cpu lammps-dp=2.0.0 -c https://conda.deepmodeling.org
conda create -n deepmd deepmd-kit=2.0.0=*cpu libdeepmd=2.0.0=*cpu lammps-dp=2.0.0 horovod -c https://conda.deepmodeling.org
```

One may enable the environment using
Expand Down
16 changes: 16 additions & 0 deletions doc/install/install-from-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,22 @@ Valid subcommands:
test test the model
```

### Install horovod and mpi4py

[Horovod](https://github.com/horovod/horovod) and [mpi4py](https://github.com/mpi4py/mpi4py) is used for parallel training. For better performance on GPU, please follow tuning steps in [Horovod on GPU](https://github.com/horovod/horovod/blob/master/docs/gpus.rst).
```bash
# With GPU, prefer NCCL as communicator.
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_HOME=/path/to/nccl pip install horovod mpi4py
```

If your work in CPU environment, please prepare runtime as below:
```bash
# By default, MPI is used as communicator.
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod mpi4py
```

If you don't install horovod, DeePMD-kit will fallback to serial mode.

## Install the C++ interface

If one does not need to use DeePMD-kit with Lammps or I-Pi, then the python interface installed in the previous section does everything and he/she can safely skip this section.
Expand Down
13 changes: 0 additions & 13 deletions doc/train/parallel-training.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,8 @@ Testing `examples/water/se_e2_a` on a 8-GPU host, linear acceleration can be obs
| 4 | 1.7635 | 56.71*4 | 3.29 |
| 8 | 1.7267 | 57.91*8 | 6.72 |

To experience this powerful feature, please intall Horovod and [mpi4py](https://github.com/mpi4py/mpi4py) first. For better performance on GPU, please follow tuning steps in [Horovod on GPU](https://github.com/horovod/horovod/blob/master/docs/gpus.rst).
```bash
# With GPU, prefer NCCL as communicator.
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_HOME=/path/to/nccl pip3 install horovod mpi4py
```

If your work in CPU environment, please prepare runtime as below:
```bash
# By default, MPI is used as communicator.
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod mpi4py
```

Horovod works in the data-parallel mode resulting a larger global batch size. For example, the real batch size is 8 when `batch_size` is set to 2 in the input file and you lauch 4 workers. Thus, `learning_rate` is automatically scaled by the number of workers for better convergence. Technical details of such heuristic rule are discussed at [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677).

With dependencies installed, have a quick try!
```bash
# Launch 4 processes on the same host
CUDA_VISIBLE_DEVICES=4,5,6,7 horovodrun -np 4 \
Expand Down

0 comments on commit 7b2df5d

Please sign in to comment.