pt: fix single-task training&data stat #3354

iProzd · 2024-02-28T08:42:46Z

No description provided.

updates: - [github.com/astral-sh/ruff-pre-commit: v0.1.13 → v0.1.14](astral-sh/ruff-pre-commit@v0.1.13...v0.1.14)  Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Should not squash

See deepmodeling#3120. - CMake: add `ENABLE_TENSORFLOW` and `ENABLE_PYTORCH`. `BUILD_TENSORFLOW` will be enabled when `TENSORFLOW_ROOT` is not empty or `USE_TF_PYTHON_LIBS` is on. - api_cc: add `BUILD_TENSORFLOW` and `BUILD_PYTORCH` defination. Move several functions from `common.h` to `commonTF.h` to prevent exposing them to header files. - CI: download libtorch in the build/test CC actions. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

Fix deepmodeling#3121. The PyTorch icon can be added when a feature implemented by PyTorch is added. However, I can't find a way to add an icon to TOC. ![image](https://github.com/deepmodeling/deepmd-kit/assets/9496702/7f29da27-af81-4850-9da0-79310d216b2d) Signed-off-by: Jinzhe Zeng <[email protected]>

Need discussion for other classes. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

…ng#3172) Fix deepmodeling#3168. See: pypa/setuptools-scm#1006 (comment) --------- Signed-off-by: Jinzhe Zeng <[email protected]>

Add a dpdata driver via the plugin mechanism (override that in the dpdata package) so it can benefit from the multiple-backend DeepPot. Currently, the driver in the dpdata package has to support both v1 and v2 for backward compatibility. When shipped within the deepmd-kit package, it only needs to support the current deepmd-kit version. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

…ng#3173) ..., so they can benifit from multiple-backend DeepPot. Update docs. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

They are used by the downstream APIs, so must be implemented. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

…odeling#3176) LAMMPS is using it Signed-off-by: Jinzhe Zeng <[email protected]>

Signed-off-by: Jinzhe Zeng <[email protected]>

Deprecate per discussion. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

Merge the deepmd-pytorch into main repo🎉 Add the following directories: - deepmd/pt : main implementations of deepmd-pytorch - source/tests/pt: UTs for deepmd-pytorch TODO list: - [x] examples added for water/se_e2_a, water/se_atten, water/dpa2 - [x] README updated (need modified) - [x] Paths in each files have been adapted. - [x] pyproject.toml needed to be merge --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jinzhe Zeng <[email protected]>

…#3184) Co-authored-by: Han Wang <[email protected]>

Fix the following compiler warning: ``` /home/runner/work/deepmd-kit/deepmd-kit/source/api_c/src/c_api.cc:1336:17: warning: returning address of local temporary object [-Wreturn-stack-address] return (int*)&(dcm->dcm.sel_types())[0]; ^~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. ``` by returning the reference of `sel_type`. `DataChargeModifier.sel_types` is not used anywhere, even in the test, so we don't have a chance to determine if there is a possible segfault, and this warning has no actual impact. It seems `DeepTensor` has returned a reference since the beginning (deepmodeling#137). (perhaps because `DeepTensor.sel_types` is used) `DeepTensor` and `DataChargeModifier` have different returned types.

... per discussion. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Just merge in form. Several options or subcommands are only supported by TensorFlow or PyTorch. Also, avoid import from `deepmd.tf` in `deepmd.utils.argcheck`. ``` Use --tf or --pt to choose the backend: dp --tf train input.json dp --pt train input.json ``` --------- Signed-off-by: Jinzhe Zeng <[email protected]>

- Set `deepmd.pt.utils.ase_calc.DPCalculator` as an alias of `deepmd.calculator.DP`; - Replace `deepmd_pt` with `deepmd.pt` in `deep_pot.py`; fix (atomic) virial output shape of `DeepPot`; add tests for them; - Set `pbc` in `pt/test_calculator.py` as it requests stress. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

1. compatible with tf 2. compatible with the input cell shape Co-authored-by: Han Wang <[email protected]>

Set the default `save_ckpt` to `model.ckpt` as the prefix. When saving checkpoints, `model.ckpt-100.pt` will be saved, and `model.ckpt.pt` will be symlinked to `model.ckpt-100.pt`. A `checkpoint` file will be dedicated to record `model.ckpt-100.pt`. This keeps the same behavior as the TF backend. One can do the below using the PT backend just like the TF backend: ```sh dp --pt train input.json # one can cancel the training before it finishes dp --pt freeze ``` --------- Signed-off-by: Jinzhe Zeng <[email protected]>

…eling#3195) Fix https://github.com/deepmodeling/deepmd-kit/security/code-scanning/2096 --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

per discussion. Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: Han Wang <[email protected]>

``` - source - tests - common - tf - pt ``` --------- Signed-off-by: Jinzhe Zeng <[email protected]>

Co-authored-by: Han Wang <[email protected]>

Fix deepmodeling#3121. There are TODOs: (1) PyTorch-backend specific features and arguments; (2) Python interface installation. Currently, the TensorFlow backend is always installed, and I am considering rewriting the logic; (3) Unsupported features - write docs when implemented. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

The default one from PyPI is for CU12. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

…pmodeling#3201) If so, throw the following error: ``` -- PyTorch CXX11 ABI: 0 CMake Error at CMakeLists.txt:162 (message): PyTorch CXX11 ABI mismatch TensorFlow: 0 != 1 ``` Signed-off-by: Jinzhe Zeng <[email protected]>

…deling#3200) Fix deepmodeling#3120. One can disable building the TensorFlow backend during `pip install` by setting `DP_ENABLE_TENSORFLOW=0`. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

```sh dp convert-backend model.pb model.pth dp convert-backend model.pb model.dp ``` --------- Signed-off-by: Jinzhe Zeng <[email protected]>

This PR is to add cross framework consistency test on DipoleFittingNet. Known Limitations: 1. There are some mismatched keys in the serialized model, only common keys are tested. --------- Signed-off-by: Anyang Peng <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

This PR is to add cross framework consistency test on PolarFittingNet. Note: `shift_diag` not yet implemented in PT. --------- Signed-off-by: Anyang Peng <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Jinzhe Zeng <[email protected]>

Fix No module named 'torch' --------- Signed-off-by: Jinzhe Zeng <[email protected]>

While a DPModel cannot be directly trained, it can be converted from another model: ```sh dp convert-backend frozen_model.pth frozen_model.dp dp test -m frozen_model.dp -s ../data/ ``` The energy result is consistent with TF and PT. Force and virial are NaN, as expected. Signed-off-by: Jinzhe Zeng <[email protected]>

Signed-off-by: Jinzhe Zeng <[email protected]>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Ensure the saved JIT model can run on both CPUs and GPUs. --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: Chun Cai <[email protected]>

Signed-off-by: Jinzhe Zeng <[email protected]>

) thus pt reusing the dp code. --------- Co-authored-by: Han Wang <[email protected]>

Fix a bug caused by the breaking change in Keras 3 (shipped by TF 2.16). --------- Signed-off-by: Jinzhe Zeng <[email protected]>

Redundant setup was removed. The setup has already been executed in the initial lines of post_force, along with subsequent calculations. Reinitialization will lead to an error.

This PR is to support `se_r` descriptor in pytorch and numpy. - [x] Refactor Pytorch env_mat: possibly combine `r` and `a`. - [x] Add numpy implementation. - [x] Add consistency test with `tf`. - [x] Refactor device as parameter --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: Han Wang <[email protected]>

The atom->image of the wannier centroid should be set to the same as its real counterpart when assigning the position. --------- Co-authored-by: Yifan Li李一帆 <[email protected]>

…#3346) Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz and others added 30 commits January 23, 2024 15:09

Merge master into devel (deepmodeling#3167)

585414a

Merge master into devel (deepmodeling#3170)

68fb16d

Should not squash

add universal Python inference interface DeepPot (deepmodeling#3164)

04c414a

Need discussion for other classes. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

detect version in advance before building deepmd-kit-cu11 (deepmodeli…

5dfbb55

…ng#3172) Fix deepmodeling#3168. See: pypa/setuptools-scm#1006 (comment) --------- Signed-off-by: Jinzhe Zeng <[email protected]>

Move model deviation and ase calculator to deepmd_utils (deepmodeli…

2a32c87

…ng#3173) ..., so they can benifit from multiple-backend DeepPot. Update docs. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

add more abstractmethods to universal DeepPot (deepmodeling#3175)

3ee3f4c

They are used by the downstream APIs, so must be implemented. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

cc: reimplement read_file_to_string without calling TensorFlow (deepm…

663e4a8

…odeling#3176) LAMMPS is using it Signed-off-by: Jinzhe Zeng <[email protected]>

breaking: move deepmd to deepmd.tf (deepmodeling#3177)

5b9dd3d

Signed-off-by: Jinzhe Zeng <[email protected]>

breaking: move deepmd_utils to deepmd (deepmodeling#3178)

3618702

Signed-off-by: Jinzhe Zeng <[email protected]>

docs: rewrite README; deprecate manually written TOC (deepmodeling#3179)

5c545f7

Deprecate per discussion. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

atomic model is not required to provide the fitting net (deepmodeling…

2e5333d

…#3184) Co-authored-by: Han Wang <[email protected]>

breaking: drop Python 3.7 support (deepmodeling#3185)

2631ce2

... per discussion. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

breaking: pt: change the virial output dim to 9 (deepmodeling#3188)

497c8ba

1. compatible with tf 2. compatible with the input cell shape Co-authored-by: Han Wang <[email protected]>

drop tqdm (deepmodeling#3194)

0bb44f3

per discussion. Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: Han Wang <[email protected]>

reorganize tests directory (deepmodeling#3196)

8900561

``` - source - tests - common - tf - pt ``` --------- Signed-off-by: Jinzhe Zeng <[email protected]>

breaking: pt: unify the output of descriptors. (deepmodeling#3190)

1e51a88

Co-authored-by: Han Wang <[email protected]>

fix: install CU11 PyTorch in the CU11 docker image (deepmodeling#3198)

4a29c8c

The default one from PyPI is for CU12. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

allow disabling TensorFlow backend during Python installation (deepmo…

de18f78

…deling#3200) Fix deepmodeling#3120. One can disable building the TensorFlow backend during `pip install` by setting `DP_ENABLE_TENSORFLOW=0`. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz and others added 18 commits February 23, 2024 04:50

feat: convert model files between backends (deepmodeling#3323)

d949bc8

```sh dp convert-backend model.pb model.pth dp convert-backend model.pb model.dp ``` --------- Signed-off-by: Jinzhe Zeng <[email protected]>

store type in descriptor serialization data (deepmodeling#3325)

649fdca

Signed-off-by: Jinzhe Zeng <[email protected]>

docs: install pytorch in RTD (deepmodeling#3333)

03ca9ab

Fix No module named 'torch' --------- Signed-off-by: Jinzhe Zeng <[email protected]>

store type in fitting serialization data (deepmodeling#3331)

91049df

Signed-off-by: Jinzhe Zeng <[email protected]>

pt: add necessary jit.export (deepmodeling#3337)

261c802

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

pt: remove env.DEVICE in all forward functions (deepmodeling#3330)

a3f4a67

Ensure the saved JIT model can run on both CPUs and GPUs. --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: Chun Cai <[email protected]>

feat(pt/dpmodel): support type_one_side in se_e2_a (deepmodeling#3339)

4f70073

Signed-off-by: Jinzhe Zeng <[email protected]>

refact: pt: mv all plugin support to base descriptor. (deepmodeling#3340

3e6b507

) thus pt reusing the dp code. --------- Co-authored-by: Han Wang <[email protected]>

bump python to 3.12 in the test environment (deepmodeling#3343)

473cc0a

Fix a bug caused by the breaking change in Keras 3 (shipped by TF 2.16). --------- Signed-off-by: Jinzhe Zeng <[email protected]>

fix_dplr.cpp delete redundant setup (deepmodeling#3344)

254afc8

Redundant setup was removed. The setup has already been executed in the initial lines of post_force, along with subsequent calculations. Reinitialization will lead to an error.

add BaseModel; store type in serialization (deepmodeling#3335)

854d998

Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: Han Wang <[email protected]>

fix_dplr.cpp set atom->image when pre_force (deepmodeling#3345)

b1de9e6

The atom->image of the wannier centroid should be set to the same as its real counterpart when assigning the position. --------- Co-authored-by: Yifan Li李一帆 <[email protected]>

apply PluginVariant and make_plugin_registry to classes (deepmodeling…

004ebd6

…#3346) Signed-off-by: Jinzhe Zeng <[email protected]>

Fix single-task training&data stat

3812866

iProzd closed this Feb 28, 2024

github-actions bot added Python Core CUDA ROCM C++ LAMMPS Gromacs Docs Examples i-PI C labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pt: fix single-task training&data stat #3354

pt: fix single-task training&data stat #3354

iProzd commented Feb 28, 2024

pt: fix single-task training&data stat #3354

pt: fix single-task training&data stat #3354

Conversation

iProzd commented Feb 28, 2024