Skip to content

Commit

Permalink
docs: improve multi-backend documentation (deepmodeling#3875)
Browse files Browse the repository at this point in the history
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Clarified the process of freezing a model by removing references to
specific code sources.
- Updated command syntax for calculating neighbor statistics to include
TensorFlow and PyTorch flags.
	- Modified descriptions to specify model files instead of graph files.
- Added and adjusted commands to support both TensorFlow and PyTorch for
training, freezing, and testing models.
- Introduced a tabular format for configuring parallelism settings for
TensorFlow and PyTorch.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
  • Loading branch information
njzjz and coderabbitai[bot] authored Jun 13, 2024
1 parent ed5a2e4 commit c644314
Show file tree
Hide file tree
Showing 9 changed files with 103 additions and 14 deletions.
4 changes: 3 additions & 1 deletion doc/freeze/freeze.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Freeze a model

The trained neural network is extracted from a checkpoint and dumped into a protobuf(.pb) file. This process is called "freezing" a model. The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). To freeze a model, typically one does
The trained neural network is extracted from a checkpoint and dumped into a model file. This process is called "freezing" a model.
To freeze a model, typically one does

::::{tab-set}

Expand All @@ -11,6 +12,7 @@ $ dp freeze -o model.pb
```

in the folder where the model is trained. The output model is called `model.pb`.
The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc).

:::

Expand Down
18 changes: 17 additions & 1 deletion doc/model/sel.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,26 @@ All descriptors require to set `sel`, which means the expected maximum number of

To determine a proper `sel`, one can calculate the neighbor stat of the training data before training:

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```sh
dp neighbor-stat -s data -r 6.0 -t O H
dp --tf neighbor-stat -s data -r 6.0 -t O H
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```sh
dp --pt neighbor-stat -s data -r 6.0 -t O H
```

:::

::::

where `data` is the directory of data, `6.0` is the cutoff radius, and `O` and `H` is the type map. The program will give the `max_nbor_size`. For example, `max_nbor_size` of the water example is `[38, 72]`, meaning an atom may have 38 O neighbors and 72 H neighbors in the training data.

The `sel` should be set to a higher value than that of the training data, considering there may be some extreme geometries during MD simulations. As a result, we set `sel` to `[46, 92]` in the water example.
47 changes: 41 additions & 6 deletions doc/model/train-fitting-dos.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Fit electronic density of states (DOS) {{ tensorflow_icon }}
# Fit electronic density of states (DOS) {{ tensorflow_icon }} {{ pytorch_icon }} {{ dpmodel_icon }}

:::{note}
**Supported backends**: TensorFlow {{ tensorflow_icon }}
**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, DP {{ dpmodel_icon }}
:::

Here we present an API to DeepDOS model, which can be used to fit electronic density of state (DOS) (which is a vector).
Expand Down Expand Up @@ -82,10 +82,26 @@ To prepare the data, we recommend shifting the DOS data by the Fermi level.

The training command is the same as `ener` mode, i.e.

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash
dp --tf train input.json
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash
dp train input.json
dp --pt train input.json
```

:::

::::

The detailed loss can be found in `lcurve.out`:

```
Expand Down Expand Up @@ -117,14 +133,33 @@ The detailed loss can be found in `lcurve.out`:

In this earlier version, we can use `dp test` to infer the electronic density of state for given frames.

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash

dp --tf freeze -o frozen_model.pb

dp --tf test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash

$DP freeze -o frozen_model.pb
dp --pt freeze -o frozen_model.pth

$DP test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100
dp --pt test -m frozen_model.pth -s ../data/111/$k -d ${output_prefix} -a -n 100
```

if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame is output in the working directory
:::

::::

if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame are output in the working directory

```
${output_prefix}.ados.out.0 ${output_prefix}.ados.out.1 ${output_prefix}.ados.out.2 ${output_prefix}.ados.out.3
Expand Down
2 changes: 1 addition & 1 deletion doc/test/model-deviation.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ One can also use a subcommand to calculate the deviation of predicted forces or
dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out
```

where `-m` specifies graph files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command:
where `-m` specifies model files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results are dumped. Here is more information on this sub-command:

```bash
usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}]
Expand Down
2 changes: 1 addition & 1 deletion doc/third-party/gromacs.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Then, in your working directories, we have to write `input.json` file:

Here is an explanation for these settings:

- `graph_file` : The graph file (with suffix .pb) generated by `dp freeze` command
- `graph_file` : The [model file](../backend.md) generated by `dp freeze` command
- `type_file` : File to specify DP atom types (in space-separated format). Here, `type.raw` looks like

```
Expand Down
1 change: 1 addition & 0 deletions doc/third-party/lammps-command.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ pair_style deepmd models ... keyword value ...
pair_style deepmd graph.pb
pair_style deepmd graph.pb fparam 1.2
pair_style deepmd graph_0.pb graph_1.pb graph_2.pb out_file md.out out_freq 10 atomic relative 1.0
pair_style deepmd graph_0.pb graph_1.pth out_file md.out out_freq 100
pair_coeff * * O H
pair_style deepmd cp.pb fparam_from_compute TEMP
Expand Down
4 changes: 2 additions & 2 deletions doc/train/training-advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,9 @@ One can set other environmental variables:
| DP_AUTO_PARALLELIZATION | 0, 1 | 0 | Enable auto parallelization for CPU operators. |
| DP_JIT | 0, 1 | 0 | Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. |

## Adjust `sel` of a frozen model
## Adjust `sel` of a frozen model {{ tensorflow_icon }}

One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of a existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`.
One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of an existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`.

```json
"model": {
Expand Down
18 changes: 17 additions & 1 deletion doc/train/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,26 @@ $ cd $deepmd_source_dir/examples/water/se_e2_a/

After switching to that directory, the training can be invoked by

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash
$ dp train input.json
$ dp --tf train input.json
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash
$ dp --pt train input.json
```

:::

::::

where `input.json` is the name of the input script.

By default, the verbosity level of the DeePMD-kit is `INFO`, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems are worth special notice.
Expand Down
21 changes: 20 additions & 1 deletion doc/troubleshooting/howtoset_num_nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,32 @@ There is no one general parallel configuration that works for all situations, so
Here are some empirical examples.
If you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash
export OMP_NUM_THREADS=3
export DP_INTRA_OP_PARALLELISM_THREADS=3
export DP_INTER_OP_PARALLELISM_THREADS=2
dp --tf train input.json
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash
export OMP_NUM_THREADS=3
export DP_INTRA_OP_PARALLELISM_THREADS=3
export DP_INTER_OP_PARALLELISM_THREADS=2
dp train input.json
dp --pt train input.json
```

:::

::::

For a node with 128 cores, it is recommended to start with the following variables:

```bash
Expand Down

0 comments on commit c644314

Please sign in to comment.