Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training CLI script prompts for unused options which MultiStepLR scheduler does not accept #90

Closed
raehik opened this issue Aug 25, 2023 · 6 comments · Fixed by #97
Closed

Comments

@raehik
Copy link
Contributor

raehik commented Aug 25, 2023

The neural net's learning rate is configurable in the training step using the CLI parameter --learning_rate <string>, where <string> is a list of epoch-rate pairs in format epoch_1/rate_1/epoch_2/rate_2/.... However, upon some investigation, only the first rate appears to be used. I find it used only during optimizer initialization:

optimizer = optim.Adam(params, lr=learning_rates[0], weight_decay=weight_decay)
lr_scheduler = MultiStepLR(optimizer, list(learning_rates.keys())[1:], gamma=0.1)

Note that due to how MultiStepLR is configured, all of the training step run configurations I have will run as expected anyway due to how the rates have been decided (they decay by a factor of 10 each step). e.g. for 0/5e-4/15/5e-5/30/5e-6, rates past 5e-4 are ignored, but the unused rates end up being the calculated ones anyway due to the decay factor.

Is this erroneous; should we be able to specify learning rate exactly? Or can we simplify: pass an initial rate, a decay factor, and the epochs to decay at. (This appears to be the pattern MultiStepLR uses.)

@raehik
Copy link
Contributor Author

raehik commented Aug 25, 2023

ping @arthurBarthe

@arthurBarthe
Copy link
Collaborator

Discussed yesterday, but adding notes to sum up: to reproduce the paper, we should not be using a scheduler, but instead the fixed learning rate updates. We can achieve this using the MultiStepLR from pytorch. I am happy to take care of that in a new branch from the main one if that sounds reasonable.

@raehik
Copy link
Contributor Author

raehik commented Aug 31, 2023

to reproduce the paper, we should not be using a scheduler, but instead the fixed learning rate updates

MultiStepLR is a scheduler, we are currently using it, and I believe it was used to decay learning rate for the paper model. This issue was regarding an apparent disconnect between the CLI and how the options get used (as it turns out, some options aren't used). We should change the CLI options so you don't have to provide values that don't get used. (Or could simply update help strings as a stopgap.)

Would be happy to review any work you do related to this, feel free to assign/ping/message me. I have some work too, but it's larger scale & I'm in the middle of migrating it from another PR.

@arthurBarthe
Copy link
Collaborator

Oh I missed that we are currently using MultiStepLR already! Then we only need to provide the initial learning rate, the decay factor (0.1) and the milestones. And we can indeed get rid of the unnecessary extra values. Does that make sense?

@raehik
Copy link
Contributor Author

raehik commented Aug 31, 2023

Right. The two-line snippet above has everything interesting. The decay factor is restricted (not configurable via CLI), the initial learning rate & milestones are recovered from the "learning rate" string. Those should be their own options instead.

@raehik raehik mentioned this issue Oct 12, 2023
6 tasks
@raehik
Copy link
Contributor Author

raehik commented Oct 26, 2023

handled in #95

@raehik raehik added this to the 2023-10 wrap-up milestone Oct 26, 2023
@raehik raehik changed the title Are CLI neural net learning rates properly configured? Training CLI script accepts unused options which the scheduler MultiStepLR does not accept Dec 4, 2023
@raehik raehik changed the title Training CLI script accepts unused options which the scheduler MultiStepLR does not accept Training CLI script prompts for unused options which MultiStepLR scheduler does not accept Dec 4, 2023
@raehik raehik linked a pull request Dec 4, 2023 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants