Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build an iterative phonon flow #306

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
6a3a342
add some drafts
JaGeo Dec 21, 2024
fc9e7a5
pre-commit auto-fixes
pre-commit-ci[bot] Dec 21, 2024
25a95b4
restructure and rely on flow instead
JaGeo Dec 21, 2024
4ca1d1c
restructure and rely on flow instead
JaGeo Dec 21, 2024
55562d5
restructure and rely on flow instead
JaGeo Dec 21, 2024
e5787c7
pre-commit auto-fixes
pre-commit-ci[bot] Dec 21, 2024
4f3d6c4
more restructuring of completeworkflow
JaGeo Dec 21, 2024
52deaa2
more restructuring of completeworkflow
JaGeo Dec 21, 2024
6ba11e0
pre-commit auto-fixes
pre-commit-ci[bot] Dec 21, 2024
855dccc
more restructuring of completeworkflow
JaGeo Dec 21, 2024
fcf0cda
bring the workflow more in shape
JaGeo Dec 22, 2024
1958654
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
3c0d6a8
fix the workflow stepwise
JaGeo Dec 22, 2024
f85baa3
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
8b54a5c
fix some more probles
JaGeo Dec 22, 2024
af16c12
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
5e09a4f
hard code displacement to 0.01 in benchmark, fix other more issues in…
JaGeo Dec 22, 2024
7094b93
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
1bd5d49
fix more logic problems
JaGeo Dec 22, 2024
295478e
make outputs nicer
JaGeo Dec 22, 2024
f7ad583
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
e17fc92
fix a bunch of tests in auto phonons after the new changes
JaGeo Dec 22, 2024
75de5db
fix more tests
JaGeo Dec 22, 2024
829e674
fix benchmark tests
JaGeo Dec 22, 2024
e2bc0eb
daza
JaGeo Dec 22, 2024
bad024c
fix more tests
JaGeo Dec 22, 2024
00e20cd
fix pre database position for all workflows
JaGeo Dec 22, 2024
c80a5ed
add more documentaion and fix number of jobs, addition of get_output
JaGeo Dec 22, 2024
9444190
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
d98727e
add more documentation
JaGeo Dec 22, 2024
215c0de
fix list comprehension
JaGeo Dec 22, 2024
1944eed
pre-commit auto-fixes
pre-commit-ci[bot] Dec 22, 2024
ec68520
mace stuff
JaGeo Dec 22, 2024
d2a1485
add to data
JaGeo Dec 23, 2024
834e175
add to data
JaGeo Dec 23, 2024
8c5aa8a
fix ranom seed for structures that are too similar
JaGeo Dec 24, 2024
13b1229
pre-commit auto-fixes
pre-commit-ci[bot] Dec 24, 2024
6eb620f
fix ranom seed for structures that are too similar
JaGeo Dec 24, 2024
bdf0c8b
fix ranom seed for structures that are too similar
JaGeo Dec 24, 2024
c59f853
pre-commit auto-fixes
pre-commit-ci[bot] Dec 24, 2024
453c75d
default random seed
JaGeo Dec 24, 2024
624f762
default random seed
JaGeo Dec 24, 2024
21f8ca2
fix write benchmark generation beyond runs in jobflow
JaGeo Dec 25, 2024
43f95d6
pre-commit auto-fixes
pre-commit-ci[bot] Dec 25, 2024
36af259
fix random seed and add strict tests
JaGeo Dec 25, 2024
a8e065b
fix lorbit problems
JaGeo Dec 25, 2024
6705f1b
fix test
JaGeo Dec 25, 2024
71c59d9
fix test
JaGeo Dec 25, 2024
82cd1e5
fix rms computation
JaGeo Dec 26, 2024
512cad7
pre-commit auto-fixes
pre-commit-ci[bot] Dec 26, 2024
1163b83
add one more test
JaGeo Dec 26, 2024
301a378
add documentation
JaGeo Dec 26, 2024
e562ce3
add hint on the default displacement
JaGeo Dec 26, 2024
74da25a
add hint on the default displacement
JaGeo Dec 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/user/phonon/flows/benchmark/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This tutorial will help you understand all the `autoplex` benchmark specificatio
## General settings

For the benchmark, you do not have to worry about a lot of settings. The crucial part here is the number of benchmark structures you are interested in.
All benchmark harmonic phonon runs will always be generated with a displacement of 0.01 even though the fitting procedure can also include different displacements.

```python
from mp_api.client import MPRester
Expand Down
42 changes: 40 additions & 2 deletions docs/user/phonon/flows/flows.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ This tutorial will demonstrate how to use `autoplex` with its default setup and

The complete workflow of `autoplex` involves the data generation
(including the execution of VASP calculations),
the fitting of the machine-learned interatomic potential (MLIP) and the benchmark to the DFT results.
the fitting of the machine-learned interatomic potential (MLIP) and the benchmark to the DFT results.

We also have an iterative version of this workflow that reruns the complete workflow until a certain quality of the phonons is reached. It is described below.

### Before running the workflow

Expand Down Expand Up @@ -89,7 +91,6 @@ Next, we are going to construct the workflow based on the rocksalt-type LiCl ([*
Remember to replace `YOUR_MP_API_KEY` with your personal [Materials Project API key](https://next-gen.materialsproject.org/api#api-key).

```python
from jobflow.core.flow import Flow
from mp_api.client import MPRester
from autoplex.auto.phonons.flows import CompleteDFTvsMLBenchmarkWorkflow

Expand Down Expand Up @@ -220,3 +221,40 @@ Potential Structure MPID Displacement (Å) RMSE (THz) imagmodes(pot)
GAP LiCl mp-22905 0.01 0.57608 False False full atom-wise f=0.1: n_sparse = 6000, SOAP delta = 0.5
```

## Iterative version of the default workflow

To systematically converge the quality of the potentials, we have built an iterative version of the default workflow `CompleteDFTvsMLBenchmarkWorkflow`. It will run the `CompleteDFTvsMLBenchmarkWorkflow` until the worst RMSE value of the benchmark structures falls under a certain value or a maximum number of repetitions is reached.

We allow users in the first generation to use a slightly different workflow than in the subsequent generations. This can help to initially obtain enough structures for an MLIP fit and only slightly increase the number of structures in the next generations.

```python
from mp_api.client import MPRester
from autoplex.auto.phonons.flows import CompleteDFTvsMLBenchmarkWorkflow, IterativeCompleteDFTvsMLBenchmarkWorkflow

mpr = MPRester(api_key='YOUR_MP_API_KEY')
structure_list = []
benchmark_structure_list = []
mpids = ["mp-22905"]
# you can put as many mpids as needed e.g. mpids = ["mp-22905", "mp-1185319"] for all LiCl entries in the Materials Project
mpbenchmark = ["mp-22905"]
for mpid in mpids:
structure = mpr.get_structure_by_material_id(mpid)
structure_list.append(structure)
for mpbm in mpbenchmark:
bm_structure = mpr.get_structure_by_material_id(mpbm)
benchmark_structure_list.append(bm_structure)

complete_flow=IterativeCompleteDFTvsMLBenchmarkWorkflow(rms_max=0.2, max_iterations=4,
complete_dft_vs_ml_benchmark_workflow_0=CompleteDFTvsMLBenchmarkWorkflow(
apply_data_preprocessing=True,
),
complete_dft_vs_ml_benchmark_workflow_1=CompleteDFTvsMLBenchmarkWorkflow(
apply_data_preprocessing=True,
)
).make(
structure_list=structure_list, mp_ids=mpids,
benchmark_structures=benchmark_structure_list, benchmark_mp_ids=mpbenchmark)

complete_flow.name = "tutorial"
autoplex_flow = complete_flow
```
281 changes: 207 additions & 74 deletions src/autoplex/auto/phonons/flows.py

Large diffs are not rendered by default.

203 changes: 195 additions & 8 deletions src/autoplex/auto/phonons/jobs.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""General AutoPLEX automation jobs."""

import re
from collections.abc import Iterable
from dataclasses import field
from pathlib import Path

import numpy as np
from atomate2.common.schemas.phonons import PhononBSDOSDoc
from atomate2.vasp.flows.core import DoubleRelaxMaker
from atomate2.vasp.jobs.base import BaseVaspMaker
from atomate2.vasp.jobs.core import StaticMaker, TightRelaxMaker
Expand All @@ -26,6 +26,133 @@


@job
def do_iterative_rattled_structures(
workflow_maker_gen_0,
workflow_maker_gen_1,
structure_list: list[Structure],
mp_ids,
dft_references: list[PhononBSDOSDoc] | None = None,
benchmark_structures: list[Structure] | None = None,
benchmark_mp_ids: list[str] | None = None,
pre_xyz_files: list[str] | None = None,
pre_database_dir: str | None = None,
rattle_seed: int | None = None,
fit_kwargs_list: list | None = None,
number_of_iteration=0,
rms=0.2,
max_iteration=5,
rms_max=0.2,
previous_output=None,
):
"""
Job to run CompleteDFTvsMLBenchmarkWorkflow in an iterative manner.

Parameters
----------
workflow_maker_gen_0: CompleteDFTvsMLBenchmarkWorkflow.
First Iteration will be performed with this flow.
workflow_maker_gen_1: CompleteDFTvsMLBenchmarkWorkflow.
All Iterations after the first one will be performed with this flow.
structure_list:
List of pymatgen structures.
mp_ids:
Materials Project IDs.
dft_references: list[PhononBSDOSDoc] | None
List of DFT reference files containing the PhononBSDOCDoc object.
Reference files have to refer to a finite displacement of 0.01.
For benchmarking, only 0.01 is supported
benchmark_structures: list[Structure] | None
The pymatgen structure for benchmarking.
benchmark_mp_ids: list[str] | None
Materials Project ID of the benchmarking structure.
pre_xyz_files: list[str] or None
Names of the pre-database train xyz file and test xyz file.
pre_database_dir: str or None
The pre-database directory.
rattle_seed: int | None
Random seed.
fit_kwargs_list : list[dict].
Dict including MLIP fit keyword args.
max_iterations: int.
Maximum number of iterations to run.
rms_max: float.
Will stop once the best potential has a max rmse below this value.
previous_output: dict | None.
Dict including the output of the previous flow.
"""
if rms is None or (number_of_iteration < max_iteration and rms > rms_max):
jobs = []

if number_of_iteration == 0:
workflow_maker = workflow_maker_gen_0
job1 = workflow_maker_gen_0.make(
structure_list=structure_list,
mp_ids=mp_ids,
dft_references=dft_references,
benchmark_structures=benchmark_structures,
benchmark_mp_ids=benchmark_mp_ids,
pre_xyz_files=pre_xyz_files,
pre_database_dir=pre_database_dir,
rattle_seed=rattle_seed,
fit_kwargs_list=fit_kwargs_list,
)
else:
workflow_maker = workflow_maker_gen_1
job1 = workflow_maker_gen_1.make(
structure_list=structure_list,
mp_ids=mp_ids,
dft_references=dft_references,
benchmark_structures=benchmark_structures,
benchmark_mp_ids=benchmark_mp_ids,
pre_xyz_files=pre_xyz_files,
pre_database_dir=pre_database_dir,
rattle_seed=rattle_seed,
fit_kwargs_list=fit_kwargs_list,
)

# rms needs to be computed somehow
job1.append_name("_" + str(number_of_iteration))
jobs.append(job1)
# order is the same as in the scaling "scale_cells"
if workflow_maker.volume_custom_scale_factors is not None:
rattle_seed = rattle_seed + (
len(workflow_maker.volume_custom_scale_factors)
* len(workflow_maker.structure_list)
)
elif workflow_maker.n_structures is not None:
rattle_seed = rattle_seed + (workflow_maker.n_structures) * len(
workflow_maker.structure_list
)

job2 = do_iterative_rattled_structures(
workflow_maker_gen_0=workflow_maker_gen_0,
workflow_maker_gen_1=workflow_maker_gen_1,
structure_list=structure_list,
mp_ids=mp_ids,
dft_references=job1.output["dft_references"],
# TODO: check if they should be optimized
benchmark_structures=job1.output["benchmark_structures"],
benchmark_mp_ids=job1.output["benchmark_mp_ids"],
pre_xyz_files=job1.output["pre_xyz_files"],
pre_database_dir=job1.output["pre_database_dir"],
rattle_seed=rattle_seed,
fit_kwargs_list=fit_kwargs_list,
number_of_iteration=number_of_iteration + 1,
rms=job1.output["rms"],
max_iteration=max_iteration,
rms_max=rms_max,
previous_output=job1.output,
)
jobs.append(job2)
# benchmark stuff has to be passed into the complete stuff later on instead of recalculating it every time
# random seed update might be the hardest part.
return Response(replace=Flow(jobs), output=job2.output)
# give a nicer output # what do we need to restart?
# should be the same as for the completeworkflow
return previous_output


@job(data=[PhononBSDOSDoc])
def complete_benchmark( # this function was put here to prevent circular import
ml_path: list,
ml_model: str,
Expand Down Expand Up @@ -114,9 +241,10 @@ def complete_benchmark( # this function was put here to prevent circular import

for path in ml_path:
suffix = Path(path).name
print(suffix)
if suffix == "without_regularization":
suffix = "without_reg"
if re.match(r"job_\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}-\d{6}-\d{5}", suffix):
if suffix not in ["phonon", "rattled"]:
suffix = ""

if phonon_displacement_maker is None:
Expand All @@ -137,6 +265,7 @@ def complete_benchmark( # this function was put here to prevent circular import
ml_potential = Path(path) / "deployed_nequip_model.pth"
else: # MACE
# treat finetuned potentials
# TODO: fix this naming issue (depends on input)
ml_potential_fine = Path(path) / "MACE_final.model"
ml_potential = (
ml_potential_fine
Expand All @@ -163,9 +292,10 @@ def complete_benchmark( # this function was put here to prevent circular import
if (
benchmark_mp_ids[ibenchmark_structure] in mp_ids
) and add_dft_phonon_struct:

dft_references = fit_input[benchmark_mp_ids[ibenchmark_structure]][
"phonon_data"
]["001"]
][f"{int(displacement * 100):03d}"]
else:
dft_phonons = dft_phonopy_gen_data(
structure=benchmark_structure,
Expand All @@ -178,7 +308,9 @@ def complete_benchmark( # this function was put here to prevent circular import
supercell_settings=supercell_settings,
)
jobs.append(dft_phonons)
dft_references = dft_phonons.output["phonon_data"]["001"]
dft_references = dft_phonons.output["phonon_data"][
f"{int(displacement * 100):03d}"
]

add_data_bm = PhononBenchmarkMaker(name="Benchmark").make(
ml_model=ml_model,
Expand Down Expand Up @@ -225,7 +357,10 @@ def complete_benchmark( # this function was put here to prevent circular import
jobs.append(add_data_bm)
collect_output.append(add_data_bm.output)

return Response(replace=Flow(jobs), output=collect_output)
return Response(
replace=Flow(jobs),
output={"bm_output": collect_output, "dft_references": dft_references},
)


@job
Expand Down Expand Up @@ -377,7 +512,7 @@ def dft_phonopy_gen_data(
"LCHARG": False, # Do not write the CHGCAR file
"LWAVE": False, # Do not write the WAVECAR file
"LVTOT": False, # Do not write LOCPOT file
"LORBIT": 0, # No output of projected or partial DOS in EIGENVAL, PROCAR and DOSCAR
"LORBIT": None, # No output of projected or partial DOS in EIGENVAL, PROCAR and DOSCAR
"LOPTICS": False, # No PCDAT file
"NSW": 200,
"NELM": 500,
Expand All @@ -402,7 +537,7 @@ def dft_phonopy_gen_data(
"LCHARG": False, # Do not write the CHGCAR file
"LWAVE": False, # Do not write the WAVECAR file
"LVTOT": False, # Do not write LOCPOT file
"LORBIT": 0, # No output of projected or partial DOS in EIGENVAL, PROCAR and DOSCAR
"LORBIT": None, # No output of projected or partial DOS in EIGENVAL, PROCAR and DOSCAR
"LOPTICS": False, # No PCDAT file
# to be removed
"NPAR": 4,
Expand Down Expand Up @@ -536,7 +671,7 @@ def dft_random_gen_data(
"LCHARG": False, # Do not write the CHGCAR file
"LWAVE": False, # Do not write the WAVECAR file
"LVTOT": False, # Do not write LOCPOT file
"LORBIT": 0, # No output of projected or partial DOS in EIGENVAL, PROCAR and DOSCAR
"LORBIT": None, # No output of projected or partial DOS in EIGENVAL, PROCAR and DOSCAR
"LOPTICS": False, # No PCDAT file
"NSW": 200,
"NELM": 500,
Expand Down Expand Up @@ -619,3 +754,55 @@ def get_iso_atom(
},
)
return Response(replace=flow)


@job(data=[PhononBSDOSDoc])
def get_output(
metrics: list,
benchmark_structures: list[Structure] | None = None,
benchmark_mp_ids: list[str] | None = None,
dft_references: list[PhononBSDOSDoc] | None = None,
pre_xyz_files: list[str] | None = None,
pre_database_dir: str | None = None,
fit_kwargs_list: list | None = None,
):
"""
Job to collect all output infos for potential restarts.

Parameters
----------
metrics: list[dict]
List of metric dictionaries from complete_benchmark jobs.
dft_references: list[PhononBSDOSDoc] | None
List of DFT reference files containing the PhononBSDOCDoc object.
Reference files have to refer to a finite displacement of 0.01.
For benchmarking, only 0.01 is supported
benchmark_structures: list[Structure] | None
The pymatgen structure for benchmarking.
benchmark_mp_ids: list[str] | None
Materials Project ID of the benchmarking structure.
pre_xyz_files: list[str] or None
Names of the pre-database train xyz file and test xyz file.
pre_database_dir: str or None
The pre-database directory.
fit_kwargs_list : list[dict].
Dict including MLIP fit keyword args.
"""
# TODO: potentially evaluation of imaginary modes

rms_max_values = [] # get the largest rms in each fit

for i in range(len(metrics[0])):
rms_max_value = max(sublist[i]["benchmark_phonon_rmse"] for sublist in metrics)
rms_max_values.append(rms_max_value)

return {
"metrics": metrics,
"rms": min(rms_max_values), # get the best fit
"benchmark_structures": benchmark_structures,
"benchmark_mp_ids": benchmark_mp_ids,
"dft_references": dft_references,
"pre_xyz_files": pre_xyz_files,
"pre_database_dir": pre_database_dir,
"fit_kwargs_list": fit_kwargs_list,
}
3 changes: 1 addition & 2 deletions src/autoplex/benchmark/phonons/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ def write_benchmark_metrics(
-------
A text file with root mean squared error between DFT and ML potential phonon band-structure
"""
# TODO: fix this part
metrics_flattened = [item for sublist in metrics for item in sublist]
# TODO: think about a better solution here

# the following code assumes all benchmark structures have the same composition
structure_composition = benchmark_structures[0].composition.reduced_formula
with open(
Expand Down
5 changes: 3 additions & 2 deletions src/autoplex/data/common/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,8 +238,9 @@ def generate_randomized_structures(
if supercell_matrix is None:
supercell_matrix = [[2, 0, 0], [0, 2, 0], [0, 0, 2]]

if n_structures < 10:
n_structures = 10
# TODO: remove this part
# if n_structures < 10:
# n_structures = 10

supercell = get_supercell(
unitcell=get_phonopy_structure(structure),
Expand Down
Loading
Loading