Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitting convergence slow #45

Open
xyttyxy opened this issue Jul 7, 2023 · 7 comments
Open

fitting convergence slow #45

xyttyxy opened this issue Jul 7, 2023 · 7 comments

Comments

@xyttyxy
Copy link

xyttyxy commented Jul 7, 2023

Hello,

I am trying to fit an ACE model to an amorphous covalent material. BFGS is the optimizer used, it works but the convergence is very slow. Is it possible to instruct BFGS / L-BFGS to take a larger step sizes or switch to another optimizer?

On a single GPU (Tesla-V100) 100 steps of BFGS takes around 1 hr. The potential has ~3000 basis functions (2 elements) and the dataset has ~1200 structures, ~140000 atoms.

Thank you for your work

@yury-lysogorskiy
Copy link
Member

Hi!

BFGS/L-BFGS-B, as a (quasi)second-order optimization methods, has no step size as a parameter. It is determined from the local curvature of loss function. You should check following points:

  • what is the average evaluation time? It is printed every step as "Time/eval: X mcs/at"
  • BFGS/L-BFGS-B runs entirely on CPU. How many CPU cores do you have?
  • Check your data. Amorphous covalent material is a complex problem in general. check train_ef_distribution.png file that contains visualization of your training data energies and forces distribution.
  • how large is your cutoff ? Too big cutoff means too much neigbours to consider

@xyttyxy
Copy link
Author

xyttyxy commented Jul 11, 2023

Thanks for the quick reply.

  1. time/eval is around 100 mcs/at.
  2. While BFGS itself is CPU only, the evaluation and auto-differentiation is GPU, is this correct? From the documentation it is not clear how to use more than 1 CPU, unless scipy does this automatically.
  3. I'm not trying to produce usable model. I'm trying to overfit the model to see how flexible it can be.
  4. The cutoff is 6.5 A.

@yury-lysogorskiy
Copy link
Member

yury-lysogorskiy commented Jul 11, 2023

  • 100 mcs/at is normal timing for fitting
  • yes, GPU is used to auto-differentiation and to compute gradients and pass them to BFGS on CPU
  • by default, scipy.minimize.BFGS uses all CPU cores available, you can check cpu usage with top or htop
  • if you want to squeeze maximum from the model, then go for more B-basis functions, i.e. 1500 - 2000 function per element or even more (depends on dataset size,avilable memory, etc.), also put almost all weights on forces (kappa=0.99)
  • cutoff is OK

@xyttyxy
Copy link
Author

xyttyxy commented Jul 11, 2023

  1. The B-basis functions set is already ~1500+ functions. I will try to go to 2000+ but I doubt that I can go much further since I need also fast prediction. Is there a way to tell which basis functions are more relevant? Other than manually excluding some lm's and retrain?
  2. I suspect the non-linear part (finnis-sinclair-type embedding) is limiting the expressivity of ACE, based on comparison with other models. I understand eventually ACE will be complete with very large basis sets but practically it may not be feasible? Have you seen other cases of this?

@yury-lysogorskiy
Copy link
Member

  1. You can try L1 regularization with large pre-factor (check fit::loss:L1_coeff option) and drop those functions (manually) which has small value of coefficients, then retrain the potential starting from that
  2. No, vice versa. linear-only model is more limited than non-linear (FS type) given the same amount of basis functions.

Some tips that you could try:

  • go to more non-linear terms ( quadratic, 1/4, etc) i.e. {ndensity:4, fs_parameters: [1,1, 1, 0.5, 1, 2, 1, 0.25] }. Here numbers go in pairs: [prefactor_1, exponent_1, prefactor_2, exponent_2, ...]
  • try to fit forces first (i.e. fit::loss:kappa=0.99) and then upfit the model with kappa=0.1

@xyttyxy
Copy link
Author

xyttyxy commented Jul 11, 2023

I meant the non-linear part is not non-linear enough (and maybe adding a MLP after the B-basis would help, as suggesting in your PRM fitting paper). I did not know I can include more non-linear terms. This is wonderful news and I will try.

Thanks for the discussion.

@jhung12
Copy link

jhung12 commented Aug 1, 2023

is there a paper explain the physics using this [1,1, 1, 0.5, 1, 2, 1, 0.25] ?
I have a hard time finding it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants