Having trouble reproducing Large Model Training #595

daniel-sintef · 2024-09-14T15:28:14Z

daniel-sintef
Sep 14, 2024

Hi Everyone,

I'm trying to understand the effects of fine-tuning on my dataset (vs just training from scratch). For fine-tuning I have been using the --foundation_model="large". I want to now retrain on my dataset (from scratch) and am trying to use the same settings as the large foundation model.

Here I'm a bit stuck. The source page: https://github.com/ACEsuit/mace-mp/releases/tag/mace_mp_0 has a training script, but it doesn't seem to correspond to the 'large model'.

E.g. when I run the archive script I get the following number of parameters:

INFO: Number of parameters: 894352

But when running 'large' I get:

Number of parameters: 2203928 (substantially more)

I can of course try to tune each of these settings by hand and verify against the generated architecture in the log in each case, but it would great if I could have some ability to just launch the 'large' architecture without using the weights so I can see 'from scratch performance'. Maybe the script used to produce the 'large' model is somewhere i haven't been able to find, if so that would be great to use as well.

running mace-torch==0.3.6

PS I think the large model uses a 4.5 cutoff not 6.0. It would probably be good to mention that in the publication version of the 'foundational models' paper which currently makes it look like the cutoff is 6 and not 4.5: Hyper-parameters All models referred to in this work use two MACE layers, a spherical expansion of up to lmax = 3, and 4-body messages in each layer (correlation order 3). All models use a 128-channel dimension for tensor decomposition. We use a radial cutoff of 6 Å and expand the interatomic distances into 10 Bessel functions multiplied by a smooth polynomial cutoff function to construct radial features, in turn fed into a fully-connected feed-forward neural network with three hidden layers of 64 hidden units and SiLU non-linearities. We fit three different size models, which only differ by the maximal message equivariance, L = 0, 1, 2 for the small, medium and large models, respectively, and provide different compromises between computational cost and fitting accuracy. The irreducible representations of the messages have alternating parity (in e3nn notation, 128x0e for the small model, 128x0e + 128x1o for the medium model, and 128x0e + 128x1o + 128x2e for the large model). All application examples in this paper are run with the medium L = 1 model as it offers a good cost-accuracy trade-off.

Answered by ilyes319

Sep 18, 2024

Hey @daniel-sintef ,

I realized we uploaded the wrong script for L=2 model on the mace-mp repo. Here is the correct one:

srun python /pscratch/sd/c/cyrusyc/mace/mace/cli/run_train.py \
    --name=$exp_name \
    --train_file="../../dataset/mptrj-gga-ggapu-train" \
    --valid_file="../../dataset/mptrj-gga-ggapu-val" \
    --statistics_file="../../dataset/mptrj-gga-ggapu-statistics.json" \
    --loss='universal' \
    --energy_weight=1 \
    --forces_weight=10 \
    --compute_stress=True \
    --stress_weight=100 \
    --stress_key='stress' \
    --eval_interval=1 \
    --error_table='PerAtomMAE' \
    --model="ScaleShiftMACE" \
    --interaction_first="RealAgnosticResidualInteractionBlock…

View full answer

ilyes319 · 2024-09-18T07:32:01Z

ilyes319
Sep 18, 2024
Maintainer

Hey @daniel-sintef ,

I realized we uploaded the wrong script for L=2 model on the mace-mp repo. Here is the correct one:

srun python /pscratch/sd/c/cyrusyc/mace/mace/cli/run_train.py \
    --name=$exp_name \
    --train_file="../../dataset/mptrj-gga-ggapu-train" \
    --valid_file="../../dataset/mptrj-gga-ggapu-val" \
    --statistics_file="../../dataset/mptrj-gga-ggapu-statistics.json" \
    --loss='universal' \
    --energy_weight=1 \
    --forces_weight=10 \
    --compute_stress=True \
    --stress_weight=100 \
    --stress_key='stress' \
    --eval_interval=1 \
    --error_table='PerAtomMAE' \
    --model="ScaleShiftMACE" \
    --interaction_first="RealAgnosticResidualInteractionBlock" \
    --interaction="RealAgnosticResidualInteractionBlock" \
    --num_interactions=2 \
    --correlation=3 \
    --max_ell=3 \
    --r_max=4.5 \
    --max_L=2 \
    --num_channels=256  \
    --num_radial_basis=8 \
    --MLP_irreps="16x0e" \
    --scaling='rms_forces_scaling' \
    --num_workers=16 \
    --lr=0.005 \
    --weight_decay=1e-8 \
    --ema \
    --ema_decay=0.995 \
    --scheduler_patience=5 \
    --batch_size=16 \
    --valid_batch_size=32 \
    --max_num_epochs=200 \
    --patience=50 \
    --amsgrad \
    --device=cuda \
    --distributed \
    --seed=1 \
    --clip_grad=100 \
    --keep_checkpoints \
    --save_cpu \

1 reply

daniel-sintef Sep 18, 2024
Author

Thanks! I was curious why large with the same settings as medium and small was not tried. (In the preprint it looks like all the MACE pretrained models have the same settings except for L, but maybe large would get substantially better settings than medium if it used the same r_cut?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Having trouble reproducing Large Model Training #595

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Having trouble reproducing Large Model Training #595

daniel-sintef Sep 14, 2024

Replies: 1 comment · 1 reply

ilyes319 Sep 18, 2024 Maintainer

daniel-sintef Sep 18, 2024 Author

daniel-sintef
Sep 14, 2024

Replies: 1 comment 1 reply

ilyes319
Sep 18, 2024
Maintainer

daniel-sintef Sep 18, 2024
Author