Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pt: fix single-task training&data stat #3354

Closed
wants to merge 135 commits into from

Conversation

iProzd
Copy link
Collaborator

@iProzd iProzd commented Feb 28, 2024

No description provided.

njzjz and others added 30 commits January 23, 2024 15:09
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.1.13 →
v0.1.14](astral-sh/ruff-pre-commit@v0.1.13...v0.1.14)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
See deepmodeling#3120.

- CMake: add `ENABLE_TENSORFLOW` and `ENABLE_PYTORCH`.
`BUILD_TENSORFLOW` will be enabled when `TENSORFLOW_ROOT` is not empty
or `USE_TF_PYTHON_LIBS` is on.
- api_cc: add `BUILD_TENSORFLOW` and `BUILD_PYTORCH` defination. Move
several functions from `common.h` to `commonTF.h` to prevent exposing
them to header files.
- CI: download libtorch in the build/test CC actions.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Fix deepmodeling#3121.

The PyTorch icon can be added when a feature implemented by PyTorch is
added.

However, I can't find a way to add an icon to TOC.


![image](https://github.com/deepmodeling/deepmd-kit/assets/9496702/7f29da27-af81-4850-9da0-79310d216b2d)

Signed-off-by: Jinzhe Zeng <[email protected]>
Need discussion for other classes.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Add a dpdata driver via the plugin mechanism (override that in the
dpdata package) so it can benefit from the multiple-backend DeepPot.
Currently, the driver in the dpdata package has to support both v1 and
v2 for backward compatibility. When shipped within the deepmd-kit
package, it only needs to support the current deepmd-kit version.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
…ng#3173)

..., so they can benifit from multiple-backend DeepPot. Update docs.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
They are used by the downstream APIs, so must be implemented.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Deprecate per discussion.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Merge the deepmd-pytorch into main repo🎉
Add the following directories:

- deepmd/pt : main implementations of deepmd-pytorch
- source/tests/pt: UTs for deepmd-pytorch

TODO list:
- [x] examples added for water/se_e2_a, water/se_atten, water/dpa2
- [x] README updated (need modified)
- [x] Paths in each files have been adapted. 
- [x] pyproject.toml needed to be merge

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jinzhe Zeng <[email protected]>
Fix the following compiler warning:
```
/home/runner/work/deepmd-kit/deepmd-kit/source/api_c/src/c_api.cc:1336:17: warning: returning address of local temporary object [-Wreturn-stack-address]
  return (int*)&(dcm->dcm.sel_types())[0];
                ^~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
```
by returning the reference of `sel_type`.

`DataChargeModifier.sel_types` is not used anywhere, even in the test,
so we don't have a chance to determine if there is a possible segfault,
and this warning has no actual impact.

It seems `DeepTensor` has returned a reference since the beginning
(deepmodeling#137). (perhaps because
`DeepTensor.sel_types` is used) `DeepTensor` and `DataChargeModifier`
have different returned types.
... per discussion.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Just merge in form. Several options or subcommands are only supported by
TensorFlow or PyTorch.

Also, avoid import from `deepmd.tf` in `deepmd.utils.argcheck`.

```
Use --tf or --pt to choose the backend:
    dp --tf train input.json
    dp --pt train input.json
```

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
- Set `deepmd.pt.utils.ase_calc.DPCalculator` as an alias of
`deepmd.calculator.DP`;
- Replace `deepmd_pt` with `deepmd.pt` in `deep_pot.py`; fix (atomic)
virial output shape of `DeepPot`; add tests for them;
- Set `pbc` in `pt/test_calculator.py` as it requests stress.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
1. compatible with tf
2. compatible with the input cell shape

Co-authored-by: Han Wang <[email protected]>
Set the default `save_ckpt` to `model.ckpt` as the prefix. When saving
checkpoints, `model.ckpt-100.pt` will be saved, and `model.ckpt.pt` will
be symlinked to `model.ckpt-100.pt`. A `checkpoint` file will be
dedicated to record `model.ckpt-100.pt`.

This keeps the same behavior as the TF backend. One can do the below
using the PT backend just like the TF backend:

```sh
dp --pt train input.json
# one can cancel the training before it finishes
dp --pt freeze
```

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
…eling#3195)

Fix
https://github.com/deepmodeling/deepmd-kit/security/code-scanning/2096

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
per discussion.

Signed-off-by: Jinzhe Zeng <[email protected]>
Co-authored-by: Han Wang <[email protected]>
```
- source
  - tests
     - common
     - tf
     - pt
```

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Fix deepmodeling#3121.

There are TODOs:
(1) PyTorch-backend specific features and arguments;
(2) Python interface installation. Currently, the TensorFlow backend is
always installed, and I am considering rewriting the logic;
(3) Unsupported features - write docs when implemented.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
The default one from PyPI is for CU12.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
…pmodeling#3201)

If so, throw the following error:
```
-- PyTorch CXX11 ABI: 0
CMake Error at CMakeLists.txt:162 (message):
  PyTorch CXX11 ABI mismatch TensorFlow: 0 != 1
```

Signed-off-by: Jinzhe Zeng <[email protected]>
…deling#3200)

Fix deepmodeling#3120.

One can disable building the TensorFlow backend during `pip install` by
setting `DP_ENABLE_TENSORFLOW=0`.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
njzjz and others added 18 commits February 23, 2024 04:50
```sh
dp convert-backend model.pb model.pth
dp convert-backend model.pb model.dp
```

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
This PR is to add cross framework consistency test on DipoleFittingNet.

Known Limitations:

1. There are some mismatched keys in the serialized model, only common
keys are tested.

---------

Signed-off-by: Anyang Peng <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This PR is to add cross framework consistency test on PolarFittingNet.

Note: `shift_diag` not yet implemented in PT.

---------

Signed-off-by: Anyang Peng <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Fix No module named 'torch'

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
While a DPModel cannot be directly trained, it can be converted from
another model:
```sh
dp convert-backend frozen_model.pth frozen_model.dp
dp test -m frozen_model.dp -s ../data/
```
The energy result is consistent with TF and PT. Force and virial are
NaN, as expected.

Signed-off-by: Jinzhe Zeng <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Ensure the saved JIT model can run on both CPUs and GPUs.

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Co-authored-by: Chun Cai <[email protected]>
)

thus pt reusing the dp code.

---------

Co-authored-by: Han Wang <[email protected]>
Fix a bug caused by the breaking change in Keras 3 (shipped by TF 2.16).

---------

Signed-off-by: Jinzhe Zeng <[email protected]>
Redundant setup was removed. The setup has already been executed in the
initial lines of post_force, along with subsequent calculations.
Reinitialization will lead to an error.
This PR is to support `se_r` descriptor in pytorch and numpy.
- [x] Refactor Pytorch env_mat: possibly combine `r` and `a`.
- [x] Add numpy implementation.
- [x] Add consistency test with `tf`.
- [x] Refactor device as parameter

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
The atom->image of the wannier centroid should be set to the same as its
real counterpart when assigning the position.

---------

Co-authored-by: Yifan Li李一帆 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants