docs: improve multi-backend documentation (deepmodeling#3875)

## Summary by CodeRabbit - **Documentation** - Clarified the process of freezing a model by removing references to specific code sources. - Updated command syntax for calculating neighbor statistics to include TensorFlow and PyTorch flags. - Modified descriptions to specify model files instead of graph files. - Added and adjusted commands to support both TensorFlow and PyTorch for training, freezing, and testing models. - Introduced a tabular format for configuring parallelism settings for TensorFlow and PyTorch.  --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
njzjz · Jun 13, 2024 · c644314 · c644314
1 parent ed5a2e4
commit c644314
Show file tree

Hide file tree

Showing 9 changed files with 103 additions and 14 deletions.
diff --git a/doc/freeze/freeze.md b/doc/freeze/freeze.md
@@ -1,6 +1,7 @@
 # Freeze a model
 
-The trained neural network is extracted from a checkpoint and dumped into a protobuf(.pb) file. This process is called "freezing" a model. The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). To freeze a model, typically one does
+The trained neural network is extracted from a checkpoint and dumped into a model file. This process is called "freezing" a model.
+To freeze a model, typically one does
 
 ::::{tab-set}
 
@@ -11,6 +12,7 @@ $ dp freeze -o model.pb
 ```
 
 in the folder where the model is trained. The output model is called `model.pb`.
+The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc).
 
 :::
 

diff --git a/doc/model/sel.md b/doc/model/sel.md
@@ -6,10 +6,26 @@ All descriptors require to set `sel`, which means the expected maximum number of
 
 To determine a proper `sel`, one can calculate the neighbor stat of the training data before training:
 
+::::{tab-set}
+
+:::{tab-item} TensorFlow {{ tensorflow_icon }}
+
 ```sh
-dp neighbor-stat -s data -r 6.0 -t O H
+dp --tf neighbor-stat -s data -r 6.0 -t O H
 ```
 
+:::
+
+:::{tab-item} PyTorch {{ pytorch_icon }}
+
+```sh
+dp --pt neighbor-stat -s data -r 6.0 -t O H
+```
+
+:::
+
+::::
+
 where `data` is the directory of data, `6.0` is the cutoff radius, and `O` and `H` is the type map. The program will give the `max_nbor_size`. For example, `max_nbor_size` of the water example is `[38, 72]`, meaning an atom may have 38 O neighbors and 72 H neighbors in the training data.
 
 The `sel` should be set to a higher value than that of the training data, considering there may be some extreme geometries during MD simulations. As a result, we set `sel` to `[46, 92]` in the water example.
diff --git a/doc/model/train-fitting-dos.md b/doc/model/train-fitting-dos.md
@@ -1,7 +1,7 @@
-# Fit electronic density of states (DOS) {{ tensorflow_icon }}
+# Fit electronic density of states (DOS) {{ tensorflow_icon }} {{ pytorch_icon }} {{ dpmodel_icon }}
 
 :::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, DP {{ dpmodel_icon }}
 :::
 
 Here we present an API to DeepDOS model, which can be used to fit electronic density of state (DOS) (which is a vector).
@@ -82,10 +82,26 @@ To prepare the data, we recommend shifting the DOS data by the Fermi level.
 
 The training command is the same as `ener` mode, i.e.
 
+::::{tab-set}
+
+:::{tab-item} TensorFlow {{ tensorflow_icon }}
+
+```bash
+dp --tf train input.json
+```
+
+:::
+
+:::{tab-item} PyTorch {{ pytorch_icon }}
+
 ```bash
-dp train input.json
+dp --pt train input.json
 ```
 
+:::
+
+::::
+
 The detailed loss can be found in `lcurve.out`:
 
 ```
@@ -117,14 +133,33 @@ The detailed loss can be found in `lcurve.out`:
 
 In this earlier version, we can use `dp test` to infer the electronic density of state for given frames.
 
+::::{tab-set}
+
+:::{tab-item} TensorFlow {{ tensorflow_icon }}
+
+```bash
+
+dp --tf freeze -o frozen_model.pb
+
+dp --tf test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100
+```
+
+:::
+
+:::{tab-item} PyTorch {{ pytorch_icon }}
+
 ```bash
 
-$DP freeze -o frozen_model.pb
+dp --pt freeze -o frozen_model.pth
 
-$DP test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100
+dp --pt test -m frozen_model.pth -s ../data/111/$k -d ${output_prefix} -a -n 100
 ```
 
-if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame is output in the working directory
+:::
+
+::::
+
+if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame are output in the working directory
 
 ```
 ${output_prefix}.ados.out.0   ${output_prefix}.ados.out.1  ${output_prefix}.ados.out.2  ${output_prefix}.ados.out.3

diff --git a/doc/test/model-deviation.md b/doc/test/model-deviation.md
@@ -59,7 +59,7 @@ One can also use a subcommand to calculate the deviation of predicted forces or
 dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out
 ```
 
-where `-m` specifies graph files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command:
+where `-m` specifies model files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results are dumped. Here is more information on this sub-command:
 
 ```bash
 usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}]

diff --git a/doc/third-party/gromacs.md b/doc/third-party/gromacs.md
@@ -105,7 +105,7 @@ Then, in your working directories, we have to write `input.json` file:
 
 Here is an explanation for these settings:
 
-- `graph_file` : The graph file (with suffix .pb) generated by `dp freeze` command
+- `graph_file` : The [model file](../backend.md) generated by `dp freeze` command
 - `type_file` : File to specify DP atom types (in space-separated format). Here, `type.raw` looks like
 
 ```

diff --git a/doc/third-party/lammps-command.md b/doc/third-party/lammps-command.md
@@ -70,6 +70,7 @@ pair_style deepmd models ... keyword value ...
 pair_style deepmd graph.pb
 pair_style deepmd graph.pb fparam 1.2
 pair_style deepmd graph_0.pb graph_1.pb graph_2.pb out_file md.out out_freq 10 atomic relative 1.0
+pair_style deepmd graph_0.pb graph_1.pth out_file md.out out_freq 100
 pair_coeff * * O H
 
 pair_style deepmd cp.pb fparam_from_compute TEMP

diff --git a/doc/train/training-advanced.md b/doc/train/training-advanced.md
@@ -170,9 +170,9 @@ One can set other environmental variables:
 | DP_AUTO_PARALLELIZATION | 0, 1          | 0             | Enable auto parallelization for CPU operators.                                                                      |
 | DP_JIT                  | 0, 1          | 0             | Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. |
 
-## Adjust `sel` of a frozen model
+## Adjust `sel` of a frozen model {{ tensorflow_icon }}
 
-One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of a existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`.
+One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of an existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`.
 
 ```json
 "model": {

diff --git a/doc/train/training.md b/doc/train/training.md
@@ -8,10 +8,26 @@ $ cd $deepmd_source_dir/examples/water/se_e2_a/
 
 After switching to that directory, the training can be invoked by
 
+::::{tab-set}
+
+:::{tab-item} TensorFlow {{ tensorflow_icon }}
+
 ```bash
-$ dp train input.json
+$ dp --tf train input.json
 ```
 
+:::
+
+:::{tab-item} PyTorch {{ pytorch_icon }}
+
+```bash
+$ dp --pt train input.json
+```
+
+:::
+
+::::
+
 where `input.json` is the name of the input script.
 
 By default, the verbosity level of the DeePMD-kit is `INFO`, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems are worth special notice.

diff --git a/doc/troubleshooting/howtoset_num_nodes.md b/doc/troubleshooting/howtoset_num_nodes.md
@@ -72,13 +72,32 @@ There is no one general parallel configuration that works for all situations, so
 Here are some empirical examples.
 If you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:
 
+::::{tab-set}
+
+:::{tab-item} TensorFlow {{ tensorflow_icon }}
+
+```bash
+export OMP_NUM_THREADS=3
+export DP_INTRA_OP_PARALLELISM_THREADS=3
+export DP_INTER_OP_PARALLELISM_THREADS=2
+dp --tf train input.json
+```
+
+:::
+
+:::{tab-item} PyTorch {{ pytorch_icon }}
+
 ```bash
 export OMP_NUM_THREADS=3
 export DP_INTRA_OP_PARALLELISM_THREADS=3
 export DP_INTER_OP_PARALLELISM_THREADS=2
-dp train input.json
+dp --pt train input.json
 ```
 
+:::
+
+::::
+
 For a node with 128 cores, it is recommended to start with the following variables:
 
 ```bash