Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decouple the lr scheduler and optimizer? #36

Open
hiyyg opened this issue Nov 1, 2021 · 5 comments
Open

decouple the lr scheduler and optimizer? #36

hiyyg opened this issue Nov 1, 2021 · 5 comments

Comments

@hiyyg
Copy link

hiyyg commented Nov 1, 2021

Hi @lessw2020, thanks for the very nice work!
I noticed that in this Ranger21, the optimizer is tightly coupled with the lr scheduler, could you guide me how I can decouple them?

@neuronflow
Copy link

I would like to second this. A split in ranger optimizer and ranger scheduler would be really cool.

@lessw2020
Copy link
Owner

Hi @hiyyg and @neuronflow,
Right now you can turn off the built in lr scheduling by turning off both warmup and warmdown:
use_warmup=False warmdown_active=False
that should simply pass through the input lr and not touch it.
Is that what you mean by decouple? Or do you mean having the scheduler seperately programmable (i.e. cosine decay vs we use linear etc).

@neuronflow
Copy link

neuronflow commented Nov 2, 2021

Or do you mean having the scheduler seperately programmable (i.e. cosine decay vs we use linear etc).

This is what I initially had in mind. Maybe, just maybe Ranger optimizer should go hand in hand with Ranger scheduler following the standard pytorch conventions?

@felipemello1
Copy link

Hi @lessw2020, apparently in this current implementation there is no way to have different parameters learn using different learning rates. Did I get it right?

If this were available, I would love to use it. Two use cases are the following:

  1. Fine tuning a network where layers closer to the head have a higher lr;
  2. My case: I train a graph neural network, and I need the embeddings to have 100x learning rate of the model, but in this current script I cant use the standard pytorch way of doing it:
model_params = [params for name, params in self.model.named_parameters() if name.startswith('emb.') == False]
emb_params = [params for name, params in self.model.named_parameters() if name.startswith('emb.') == True]
optimizer_model = madgrad_wd([{'params': emb_params, 'lr': self.model_config['emb_max_lr']},
                          {'params': model_params, 'lr': self.model_config['model_max_lr']}], weight_decay=self.model_config['wd'])

@lessw2020
Copy link
Owner

Hi @fmellomascarenhas, @neuronflow and @hiyyg - fully agree with all the points above (decoupled scheduler and parameter groups.
This split between scheduler and optimizer will happen for Ranger22 (the 2022 edition lol).
Should have more info and updates shortly, as we just agreed last night to go ahead with the Ranger22 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants