fitting convergence slow #45

xyttyxy · 2023-07-07T19:14:54Z

Hello,

I am trying to fit an ACE model to an amorphous covalent material. BFGS is the optimizer used, it works but the convergence is very slow. Is it possible to instruct BFGS / L-BFGS to take a larger step sizes or switch to another optimizer?

On a single GPU (Tesla-V100) 100 steps of BFGS takes around 1 hr. The potential has ~3000 basis functions (2 elements) and the dataset has ~1200 structures, ~140000 atoms.

Thank you for your work

yury-lysogorskiy · 2023-07-10T13:54:34Z

Hi!

BFGS/L-BFGS-B, as a (quasi)second-order optimization methods, has no step size as a parameter. It is determined from the local curvature of loss function. You should check following points:

what is the average evaluation time? It is printed every step as "Time/eval: X mcs/at"
BFGS/L-BFGS-B runs entirely on CPU. How many CPU cores do you have?
Check your data. Amorphous covalent material is a complex problem in general. check train_ef_distribution.png file that contains visualization of your training data energies and forces distribution.
how large is your cutoff ? Too big cutoff means too much neigbours to consider

xyttyxy · 2023-07-11T20:22:51Z

Thanks for the quick reply.

time/eval is around 100 mcs/at.
While BFGS itself is CPU only, the evaluation and auto-differentiation is GPU, is this correct? From the documentation it is not clear how to use more than 1 CPU, unless scipy does this automatically.
I'm not trying to produce usable model. I'm trying to overfit the model to see how flexible it can be.
The cutoff is 6.5 A.

yury-lysogorskiy · 2023-07-11T20:42:08Z

100 mcs/at is normal timing for fitting
yes, GPU is used to auto-differentiation and to compute gradients and pass them to BFGS on CPU
by default, scipy.minimize.BFGS uses all CPU cores available, you can check cpu usage with top or htop
if you want to squeeze maximum from the model, then go for more B-basis functions, i.e. 1500 - 2000 function per element or even more (depends on dataset size,avilable memory, etc.), also put almost all weights on forces (kappa=0.99)
cutoff is OK

xyttyxy · 2023-07-11T20:59:02Z

The B-basis functions set is already ~1500+ functions. I will try to go to 2000+ but I doubt that I can go much further since I need also fast prediction. Is there a way to tell which basis functions are more relevant? Other than manually excluding some lm's and retrain?
I suspect the non-linear part (finnis-sinclair-type embedding) is limiting the expressivity of ACE, based on comparison with other models. I understand eventually ACE will be complete with very large basis sets but practically it may not be feasible? Have you seen other cases of this?

yury-lysogorskiy · 2023-07-11T21:08:09Z

You can try L1 regularization with large pre-factor (check fit::loss:L1_coeff option) and drop those functions (manually) which has small value of coefficients, then retrain the potential starting from that
No, vice versa. linear-only model is more limited than non-linear (FS type) given the same amount of basis functions.

Some tips that you could try:

go to more non-linear terms ( quadratic, 1/4, etc) i.e. {ndensity:4, fs_parameters: [1,1, 1, 0.5, 1, 2, 1, 0.25] }. Here numbers go in pairs: [prefactor_1, exponent_1, prefactor_2, exponent_2, ...]
try to fit forces first (i.e. fit::loss:kappa=0.99) and then upfit the model with kappa=0.1

xyttyxy · 2023-07-11T21:22:04Z

I meant the non-linear part is not non-linear enough (and maybe adding a MLP after the B-basis would help, as suggesting in your PRM fitting paper). I did not know I can include more non-linear terms. This is wonderful news and I will try.

Thanks for the discussion.

jhung12 · 2023-08-01T22:38:09Z

is there a paper explain the physics using this [1,1, 1, 0.5, 1, 2, 1, 0.25] ?
I have a hard time finding it

mingwang-zhong mentioned this issue Sep 11, 2024

Performance of the package #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fitting convergence slow #45

fitting convergence slow #45

xyttyxy commented Jul 7, 2023

yury-lysogorskiy commented Jul 10, 2023

xyttyxy commented Jul 11, 2023

yury-lysogorskiy commented Jul 11, 2023 •

edited

Loading

xyttyxy commented Jul 11, 2023

yury-lysogorskiy commented Jul 11, 2023

xyttyxy commented Jul 11, 2023 •

edited

Loading

jhung12 commented Aug 1, 2023

fitting convergence slow #45

fitting convergence slow #45

Comments

xyttyxy commented Jul 7, 2023

yury-lysogorskiy commented Jul 10, 2023

xyttyxy commented Jul 11, 2023

yury-lysogorskiy commented Jul 11, 2023 • edited Loading

xyttyxy commented Jul 11, 2023

yury-lysogorskiy commented Jul 11, 2023

xyttyxy commented Jul 11, 2023 • edited Loading

jhung12 commented Aug 1, 2023

yury-lysogorskiy commented Jul 11, 2023 •

edited

Loading

xyttyxy commented Jul 11, 2023 •

edited

Loading