Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
Signed-off-by: Jinzhe Zeng <[email protected]>
  • Loading branch information
njzjz committed Oct 25, 2023
1 parent d004e50 commit c01d9e2
Show file tree
Hide file tree
Showing 20 changed files with 516 additions and 23 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ A full [document](doc/train/train-input-auto.rst) on options in the training inp
- [Deep potential long-range](doc/model/dplr.md)
- [Deep Potential - Range Correction (DPRc)](doc/model/dprc.md)
- [Linear model](doc/model/linear.md)
- [Interpolation with a pairwise potential](doc/model/pairtab.md)
- [Training](doc/train/index.md)
- [Training a model](doc/train/training.md)
- [Advanced options](doc/train/training-advanced.md)
Expand Down
40 changes: 40 additions & 0 deletions doc/freeze/compress.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Compress a model

## Theory

The compression of the DP model uses three techniques, tabulated inference, operator merging, and precise neighbor indexing, to improve the performance of model training and inference when the model parameters are properly trained.

For better performance, the NN inference can be replaced by tabulated function evaluations if the input of the NN is of dimension one.
The idea is to approximate the output of the NN by a piece-wise polynomial fitting.
The input domain (a compact domain in $\mathbb R$) is divided into $L_c$ equally spaced intervals, in which apply a fifth-order polynomial $g^l_m(x)$ approximation of the $m$-th output component of the NN function:
```math
g^l_m(x) = a^l_m x^5 + b^l_m x^4 + c^l_m x^3 + d^l_m x^2 + e^l_m x + f^l_m,\quad
x \in [x_l, x_{l+1}),
\label{eq:compress}
```
where $l=1,2,\dots,L_c$ is the index of the intervals, $x_1, \dots, x_{L_c}, x_{L_c+1}$ are the endpoints of the intervals, and $a^l_m$, $b^l_m$, $c^l_m$, $d^l_m$, $e^l_m$, and $f^l_m$ are the fitting parameters.
The fitting parameters can be computed by the equations below:
```math
a^l_m &= \frac{1}{2\Delta x_l^5}[12h_{m,l}-6(y'_{m,l+1}+y'_{m,l})\Delta x_l + (y''_{m,l+1}-y''_{m,l})\Delta x_l^2], \\
b^l_m &= \frac{1}{2\Delta x_l^4}[-30h_{m,l} +(14y'_{m,l+1}+16y'_{m,l})\Delta x_l + (-2y''_{m,l+1}+3y''_{m,l})\Delta x_l^2], \\
c^l_m &= \frac{1}{2\Delta x_l^3}[20h_{m,l}-(8y'_{m,l+1}+12y'_{m,l})\Delta x_l + (y''_{m,l+1}-3y''_{m,l})\Delta x_l^2], \\
d^l_m &= \frac{1}{2}y''_{m,l}, \\
e^l_m &= y_{m,l}', \\
f^l_m &= y_{m,l},
```
where $\Delta x_l=x_{l+1}-x_l$ denotes the size of the interval. $h_{m,l}=y_{m,l+1}-y_{m,l}$. $y_{m,l} = y_m(x_l)$, $y'_{m,l} = y'_m(x_l)$ and $y''_{m,l} = y''_m(x_l)$ are the value, the first-order derivative, and the second-order derivative of the $m$-th component of the target NN function at the interval point $x_l$, respectively.
The first and second-order derivatives are easily calculated by the back-propagation of the NN functions.

In the standard DP model inference, taking the [two-body embedding descriptor](../model/train-se-e2-a.md) as an example, the matrix product $(\mathcal G^i)^T \mathcal R$ requires the transfer of the tensor $\mathcal G^i$ between the register and the host/device memories, which usually becomes the bottle-neck of the computation due to the relatively small memory bandwidth of the GPUs.
The compressed DP model merges the matrix multiplication $(\mathcal G^i)^T \mathcal R$ with the tabulated inference step.
More specifically, once one column of the $(\mathcal G^i)^T $ is evaluated, it is immediately multiplied with one row of the environment matrix in the register, and the outer product is deposited to the result of $(\mathcal G^i)^T \mathcal R$.
By the operator merging technique, the allocation of $\mathcal G^i$ and the memory movement between register and host/device memories is avoided.
The operator merging of the three-body embedding can be derived analogously.

The first dimension, $N_c$, of the environment ($\mathcal R^i$) and embedding ($\mathcal G^i$) matrices is the expected maximum number of neighbors.
If the number of neighbors of an atom is smaller than $N_c$, the corresponding positions of the matrices are pad with zeros.
In practice, if the real number of neighbors is significantly smaller than $N_c$, a notable operation is spent on the multiplication of padding zeros.
In the compressed DP model, the number of neighbors is precisely indexed at the tabulated inference stage, further saving computational costs.[^1]

[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).

## Instructions

Once the frozen model is obtained from DeePMD-kit, we can get the neural network structure and its parameters (weights, biases, etc.) from the trained model, and compress it in the following way:
```bash
dp compress -i graph.pb -o graph-compress.pb
Expand Down
22 changes: 22 additions & 0 deletions doc/model/dplr.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,28 @@ The method of DPLR is described in [this paper][1]. One is recommended to read t

In the following, we take the DPLR model for example to introduce the training and LAMMPS simulation with the DPLR model. The DPLR model is trained in two steps.

## Theory

The Deep Potential Long Range (DPLR) model adds the electrostatic energy to the total energy:
```math
E=E_{\text{DP}} + E_{\text{ele}},
```
where $E_{\text{DP}}$ is the short-range contribution constructed as the [standard energy model](./train-energy.md) that is fitted against $(E^\ast-E_{\text{ele}})$.
$E_{\text{ele}}$ is the electrostatic energy
introduced by a group of Gaussian distributions that is an approximation of the electronic structure of the system, and is calculated in Fourier space by
```math
E_{\text{ele}} = \frac{1}{2\pi V}\sum_{m \neq 0, \|m\|\leq L} \frac{\exp({-\pi ^2 m^2/\beta ^2})}{m^2}S^2(m),
```
where $\beta$ is a freely tunable parameter that controls the spread of the Gaussians.
$L$ is the cutoff in Fourier space and $S(m)$, the structure factor, is given by
```math
S(m)=\sum_i q_i e^{-2\pi \imath m \boldsymbol r_i} + \sum_n q_n e^{-2\pi \imath m \boldsymbol W_n},
```
where $\imath = \sqrt{-1}$ denotes the imaginary unit, $\boldsymbol r_i$ indicates ion coordinates, $q_i$ is the charge of the ion $i$, and $W_n$ is the $n$-th Wannier centroid (WC) which can be obtained from a separated [dipole model](./train-fitting-tensor.md).
It can be proved that the error in the electrostatic energy introduced by the Gaussian approximations is dominated by a summation of dipole-quadrupole interactions that decay as $r^{-4}$, where $r$ is the distance between the dipole and quadrupole.[^1]

[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).

## Train a deep Wannier model for Wannier centroids

We use the deep Wannier model (DW) to represent the relative position of the Wannier centroid (WC) with the atom with which it is associated. One may consult the introduction of the [dipole model](train-fitting-tensor.md) for a detailed introduction. An example input `wc.json` and a small dataset `data` for tutorial purposes can be found in
Expand Down
38 changes: 36 additions & 2 deletions doc/model/dprc.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,49 @@

Deep Potential - Range Correction (DPRc) is designed to combine with QM/MM method, and corrects energies from a low-level QM/MM method to a high-level QM/MM method:

$$ E=E_\text{QM}(\mathbf R; \mathbf P) + E_\text{QM/MM}(\mathbf R; \mathbf P) + E_\text{MM}(\mathbf R) + E_\text{DPRc}(\mathbf R) $$
```math
E=E_\text{QM}(\mathbf R; \mathbf P) + E_\text{QM/MM}(\mathbf R; \mathbf P) + E_\text{MM}(\mathbf R) + E_\text{DPRc}(\mathbf R)
```

## Theory

Deep Potential - Range Correction (DPRc) was initially designed to correct the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a high-level ''ab initio'' QM/MM theory in a range-correction way to quantitatively correct short and mid-range non-bonded interactions leveraging the non-bonded lists routinely used in molecular dynamics simulations using molecular mechanical force fields such as AMBER.
In this way, long-ranged electrostatic interactions can be modeled efficiently using the particle mesh Ewald method or its extensions for multipolar and QM/MM potentials.
In a DPRc model, the switch function is modified to disable MM-MM interaction:
```math
s_\text{DPRc}(r_{ij}) =
\begin{cases}
0, &\text{if $i \in \text{MM} \land j \in \text{MM}$}, \\
s(r_{ij}), &\text{otherwise},
\end{cases}
```
where $s_\text{DPRc}(r_{ij})$ is the new switch function and $s(r_{ij})$ is the old one.
This ensures the forces between MM atoms are zero, i.e.
```math
{\boldsymbol F}_{ij} = - \frac{\partial E}{\partial \boldsymbol r_{ij}} = 0, \quad i \in \text{MM} \land j \in \text{MM}.
```
The fitting network is revised to remove energy bias from MM atoms:
```math
E_i=
\begin{cases}
\mathcal{F}_0(\mathcal{D}^i), &\text{if $i \in \text{QM}$}, \\
\mathcal{F}_0(\mathcal{D}^i) - \mathcal{F}_0(\mathbf{0}), &\text{if $i \in \text{MM}$},
\end{cases}
```
where $\mathbf{0}$ is a zero matrix.
It is worth mentioning that usage of DPRc is not limited to its initial design for QM/MM correction and can be expanded to any similar interaction.[^1]

[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).

See the [JCTC paper](https://doi.org/10.1021/acs.jctc.1c00201) for details.

## Training data

Instead the normal _ab initio_ data, one needs to provide the correction from a low-level QM/MM method to a high-level QM/MM method:

$$ E = E_\text{high-level QM/MM} - E_\text{low-level QM/MM} $$
```math
$ E = E_\text{high-level QM/MM} - E_\text{low-level QM/MM}
```

Two levels of data use the same MM method, so $E_\text{MM}$ is eliminated.

Expand Down
1 change: 1 addition & 0 deletions doc/model/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@
- [Deep potential long-range](dplr.md)
- [Deep Potential - Range Correction (DPRc)](dprc.md)
- [Linear model](linear.md)
- [Interpolation with a pairwise potential](pairtab.md)
1 change: 1 addition & 0 deletions doc/model/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ Model
dplr
dprc
linear
pairtab
25 changes: 25 additions & 0 deletions doc/model/overall.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# Overall

## Theory

A Deep Potential (DP) model, denoted by $\mathcal{M}$, can be generally represented as

```math
\boldsymbol y_i = \mathcal M (\boldsymbol x_i, \{\boldsymbol x_j\}_{j\in n(i)}; \boldsymbol \theta)
= \mathcal{F} \big( \mathcal{D} (\boldsymbol x_i, \{\boldsymbol x_j\}_{j\in n(i)}; \boldsymbol \theta_d) ; \boldsymbol \theta_f \big),
```

where $\boldsymbol{y}_i$ is the fitting properties, $\mathcal{F}$ is the fitting network, $\mathcal{D}$ is the descriptor.
$\boldsymbol{x} = (\boldsymbol r_i, \alpha_i)$, with $\boldsymbol r_i$ being the Cartesian coordinates and $\alpha_i$ being the chemical species, denotes the degrees of freedom of the atom $i$.
The indices of the neighboring atoms (i.e.~atoms within a certain cutoff radius) of atom $i$ are given by the notation $n(i)$.
Note that the Cartesian coordinates can be either under the periodic boundary condition (PBC) or in vacuum (under the open boundary condition).
The network parameters are denoted by $\boldsymbol \theta = \{\boldsymbol \theta_d, \boldsymbol \theta_f\}$, where $\boldsymbol \theta_d$ and $\boldsymbol\theta_f$ yield the network parameters of the descriptor (if any) and those of the fitting network, respectively.
From the above equation, one may compute the global property of the system by
\begin{align}
\boldsymbol y = \sum_{i=1}^N \boldsymbol y_i,
\end{align}
where $N$ is the number of atoms in a frame.
For example, if $y_i$ represents the potential energy contribution of atom $i$, then $y$ gives the total potential energy of the frame.[^1]

[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).

## Instructions

A model has two parts, a descriptor that maps atomic configuration to a set of symmetry invariant features, and a fitting net that takes descriptor as input and predicts the atomic contribution to the target physical property. It's defined in the {ref}`model <model>` section of the `input.json`, for example,
```json
"model": {
Expand Down
36 changes: 36 additions & 0 deletions doc/model/pairtab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Interpolation with a pairwise potential

## Theory
In applications like the radiation damage simulation, the interatomic distance may become too close, so that the DFT calculations fail.
In such cases, the DP model that is an approximation of the DFT potential energy surface is usually replaced by an empirical potential, like the Ziegler-Biersack-Littmark (ZBL) screened nuclear repulsion potential in the radiation damage simulations.
The DeePMD-kit package supports the interpolation between DP and an empirical pairwise potential
```math
E_i = (1-w_i) E_i^{\mathrm{DP}} + w_i (E_i^0 + E_i^{\mathrm{pair}}),
```
where the $w_i$ is the interpolation weight and the $E_i^{\mathrm{pair}} $ is the atomic contribution due to the pairwise potential $u^{\mathrm{pair}}(r)$, i.e.
```math
E_i^{\mathrm{pair}} = \sum_{j\in n(i)} u^{\mathrm{pair}}(r_{ij}).
```
The interpolation weight $w_i$ is defined by
```math
w_i =
\begin{cases}
1, & \sigma_i <r_a, \\
u_i^3 (-6 u_i^2 +15 u_i -10) +1, & r_a \leq \sigma_i <r_b, \\
0, & \sigma_i \geq r_b,
\end{cases}
\label{eq:interpolation}
```
where $u_i = (\sigma_i - r_a ) / (r_b - r_a)$.
$E_i^0$ is the atom energy bias.
In the range $[r_a, r_b]$, the DP model smoothly switched off and the pairwise potential smoothly switched on from $r_b$ to $r_a$. The $\sigma_i$ is the softmin of the distance between atom $i$ and its neighbors,
```math
\sigma_i =
\dfrac
{\sum\limits_{j\in n(i)} r_{ij} e^{-r_{ij} / \alpha_s}}
{\sum\limits_{j\in n(i)} e^{-r_{ij} / \alpha_s}},
```
where the scale $\alpha_s$ is a tunable scale of the interatomic distance $r_{ij}$.
The pairwise potential $u^{\textrm{pair}}(r)$ is defined by a user-defined table that provides the value of $u^{\textrm{pair}}$ on an evenly discretized grid from 0 to the cutoff distance.[^1]

[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).
Loading

0 comments on commit c01d9e2

Please sign in to comment.