Low RMSE Error for Stress, but failed Parity Plot #542
Replies: 12 comments 43 replies
-
Maybe it's because you used E0s as average. My discussion #317 can be the answer. |
Beta Was this translation helpful? Give feedback.
-
your relative force errors are not so great either. could be due to the float32, also could be due to your very high stress weights. if you are using the latest version of the code, a stress weight bug was fixed, so you no longer need large stress weights. |
Beta Was this translation helpful? Give feedback.
-
The average E0s are unlikely to cause the problem with the stress. Although it is high recommended to use isolated atom energies as E0s, because then you are training for the binding energy, rather than the deviation from the average energy. This is especially important when you have more varied stoichiometry test data, because the average energy is then extremely dependent on the stoichiometry. Using isolated atom E0s will lead to much more stable potentials. |
Beta Was this translation helpful? Give feedback.
-
@mstapelberg Can you please post your full log file? |
Beta Was this translation helpful? Give feedback.
-
also make sure that your data is k-point converged. what energy cutoff are you using? |
Beta Was this translation helpful? Give feedback.
-
one difference between mace and other, more nonlinear architectures is that if you have errors in your data (unconverted dft for example), mace will refuse to fit it well. |
Beta Was this translation helpful? Give feedback.
-
Isn't the factor 1602? I get: |
Beta Was this translation helpful? Give feedback.
-
Hi folks, update here on some convergence testing once again, but instead of in pure Vanadium I did it in 24V-10Cr-10Ti-10W-10Zr / V-15at%Cr-15at%Ti-15at%W-15at%Zr (the most alloyed material I would look at in my analysis) I originally was using an Encut of 360 eV and a 6 x 6 x 6 kpoint mesh as that seemed sufficient for pure Vanadium. It now looks like a kspacing of 0.15 for my unit cell created a 5 x 5 x 5 grid (with 64 atoms). It seems that my energy cut was indeed too low, but my kpoints mesh was sufficient. Sounds like I should re-generate more training data in the next couple of days and re-train a few models, including isolated atoms? If that seems to fix my issue I'll close. Thanks again for everyone's assistance! |
Beta Was this translation helpful? Give feedback.
-
Hi @gabor1 @bernstei , I hope you are doing well! I have just gotten some updated results. I have retrained a small model on ~1000 structures with higher energy cutoffs and a converged k-point mesh. I used 550 eV energy cutoff and a KSPACING = 0.15. Here are my logs and parity plots: 261 2024-08-15 02:58:12.617 INFO: Evaluating valid ...
262 2024-08-15 02:58:14.673 INFO:
263 +-------------+---------------------+------------------+-------------------+---------------------------------------+
264 | config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % | RMSE Stress (Virials) / meV / A (A^3) |
265 +-------------+---------------------+------------------+-------------------+---------------------------------------+
266 | train | 7.8 | 6.0 | 4.43 | 16.2 |
267 | valid | 8.8 | 7.4 | 5.72 | 14.3 |
268 +-------------+---------------------+------------------+-------------------+---------------------------------------+
269 2024-08-15 02:58:14.673 INFO: Saving model to checkpoints/vcrtiwzr_perf_fep_e1_f50_s10_run-2222.model
270 2024-08-15 02:58:15.286 INFO: Compiling model, saving metadata to vcrtiwzr_perf_fep_e1_f50_s10_compiled.model
271 2024-08-15 02:58:16.524 INFO: Loading checkpoint: checkpoints/vcrtiwzr_perf_fep_e1_f50_s10_run-2222_epoch-490_swa.pt
272 /home/myless/.mambaforge/envs/mace-11.7/lib/python3.11/site-packages/torch/jit/_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
273 warnings.warn("The TorchScript type system doesn't support "
274 2024-08-15 02:58:17.648 INFO: Loaded model from epoch 490
275 2024-08-15 02:58:17.649 INFO: Evaluating train ...
276 2024-08-15 02:58:30.078 INFO: Evaluating valid ...
277 2024-08-15 02:58:32.137 INFO:
278 +-------------+---------------------+------------------+-------------------+---------------------------------------+
279 | config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % | RMSE Stress (Virials) / meV / A (A^3) |
280 +-------------+---------------------+------------------+-------------------+---------------------------------------+
281 | train | 3.8 | 5.9 | 4.35 | 8.2 |
282 | valid | 2.8 | 7.4 | 5.78 | 5.3 |
283 +-------------+---------------------+------------------+-------------------+---------------------------------------+
284 2024-08-15 02:58:32.137 INFO: Saving model to checkpoints/vcrtiwzr_perf_fep_e1_f50_s10_run-2222_stagetwo.model
285 2024-08-15 02:58:32.749 INFO: Compiling model, saving metadata vcrtiwzr_perf_fep_e1_f50_s10_stagetwo_compiled.model
286 2024-08-15 02:58:33.954 INFO: Done Full log here: I haven't included isolated atoms yet, would that be the next step? |
Beta Was this translation helpful? Give feedback.
-
Hello, @bernstei @gabor1 @ilyes319 I hope you both are doing well! I have a bit of a conundrum here. I have fit MACE to the exact same data set (used same train, val, and test) that I used to fit a CHGNet model. Here is the CHGNet model's (using their standard settings, batch_size = 8, lr=0.005) Parity Plot: The CHGNet model is somehow able to find the stress trend? Whereas Mace gets very impressive energies and forces, but misses quite a large amount of the stresses. Is there any idea why this is happening? I imagine this is simply user error, but even after altering weights, rmax, precision, number of channels, etc I am still getting similar results. Thank you again for all your help! |
Beta Was this translation helpful? Give feedback.
-
you say you have tried different weights. have you tried to increase the stress weights? how much? Also, your stress plot shows two different clusters. can you inspect which structures are in which cluster, does that reveal any pattern? |
Beta Was this translation helpful? Give feedback.
-
There was a discussion above about the stress conversion factor. Are you now using ASE's built-in parsing of OUTCAR and then writing with ase.io.write to the extxyz format, or are you still doing it yourself (albeit with the correct magnitude factor)? If the latter, are you confident about the sign? Might be interesting to run, empirically, with a flipped stress sign, as that is a common error. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am finding difficulties training a MACE potential with Energies, Forces, and Stresses from scratch with the following command:
My training/validation results are the following:
However when making a parity plot, I get the following results (very impressive for Energy and Force, but less so for Stress):
I made sure to convert my vasp stresses from kbar to eV/Ang^3 by dividing each stress tensor component by 160.21766208.
Here's an example header for my xyz frames:
I'm not entirely sure what is causing this issue, but I've seen a similar issue while using Nequip and Allegro. I checked my energy cutoff, and got the following results:
I've not had issues with this using CHGNet (however it was only with V-Cr-Ti, however I don't anticipate the addition of W and Zr to affect the ability to train stresses). So it's probably something I'm doing wrong with setting up these models!
Thanks again!
Myles
Beta Was this translation helpful? Give feedback.
All reactions