deepmodeling · wanghan-iapcm · Dec 26, 2024 · Dec 26, 2024
diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
@@ -4,7 +4,7 @@
 **Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
 :::
 
-The DPA-2 model implementation. See https://doi.org/10.1038/s41524-024-01493-2 for more details.
+The DPA-2 model implementation. See [DPA-2 paper](https://doi.org/10.1038/s41524-024-01493-2) for more details.
 
 Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](../../examples/water/dpa2/README.md) for inputs in different levels.
 

diff --git a/doc/model/train-se-atten.md b/doc/model/train-se-atten.md
@@ -8,7 +8,7 @@
 
 Here we propose DPA-1, a Deep Potential model with a novel attention mechanism, which is highly effective for representing the conformation and chemical spaces of atomic systems and learning the PES.
 
-See [this paper](https://arxiv.org/abs/2208.08236) for more information. DPA-1 is implemented as a new descriptor `"se_atten"` for model training, which can be used after simply editing the input.json.
+See [this paper](https://www.nature.com/articles/s41524-024-01278-7) for more information. DPA-1 is implemented as a new descriptor `"se_atten"` for model training, which can be used after simply editing the input.json.
 
 ## Theory
 
@@ -71,7 +71,7 @@ Then layer normalization is added in a residual way to finally obtain the self-a
 Next, we will list the detailed settings in input.json and the data format, especially for large systems with dozens of elements. An example of DPA-1 input can be found in `examples/water/se_atten/input.json`.
 
 The notation of `se_atten` is short for the smooth edition of Deep Potential with an attention mechanism.
-This descriptor was described in detail in [the DPA-1 paper](https://arxiv.org/abs/2208.08236) and the images above.
+This descriptor was described in detail in [the DPA-1 paper](https://www.nature.com/articles/s41524-024-01278-7) and the images above.
 
 In this example, we will train a DPA-1 model for a water system. A complete training input script of this example can be found in the directory:
 

diff --git a/doc/train/finetuning.md b/doc/train/finetuning.md
@@ -9,7 +9,7 @@ to vastly reduce the training cost, while it's not trivial in potential models.
 Compositions and configurations of data samples or even computational parameters in upstream software (such as VASP)
 may be different between the pre-trained and target datasets, leading to energy shifts or other diversities of training data.
 
-Recently the emerging of methods such as [DPA-1](https://arxiv.org/abs/2208.08236) has brought us to a new stage where we can
+Recently the emerging of methods such as [DPA-1](https://www.nature.com/articles/s41524-024-01278-7) has brought us to a new stage where we can
 perform similar pretraining-finetuning approaches.
 They can hopefully learn the common knowledge in the pre-trained dataset (especially the `force` information)
 and thus reduce the computational cost in downstream training tasks.
@@ -19,7 +19,7 @@ and thus reduce the computational cost in downstream training tasks.
 If you have a pre-trained model `pretrained.pb`
 (here we support models using [`se_atten`](../model/train-se-atten.md) descriptor and [`ener`](../model/train-energy.md) fitting net)
 on a large dataset (for example, [OC2M](https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md) in
-DPA-1 [paper](https://arxiv.org/abs/2208.08236)), a finetuning strategy can be performed by simply running:
+DPA-1 [paper](https://www.nature.com/articles/s41524-024-01278-7)), a finetuning strategy can be performed by simply running:
 
 ```bash
 $ dp train input.json --finetune pretrained.pb