-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Aggregator | Executor: torch_model_adapter] #243
Comments
I tried to change the model from I also changed the server side learning rate from 0.05 to 0.01. As you can see, the test accuracy did not improve across 100 rounds. However, the training loss did decrease from 4 to 1 roughly for mobilenet and from 4.12 to 0.62 for lr=0.01. |
Okay. I think we can use fedavg to train a model and see whether the training loss decreases to 1. This helps us understand whether the training part is correct. If it decreases to 1, then we'll know there must be something wrong on the testing side. |
Hello, I met the similar problem when I try to use Yogi in the Oort with FEMNIST. Although the training loss can decrease to 1, the test loss is still high and remains around 4, and the test accuracy can not increase and remains below 0.01. In addition, everything looks normal without using Yogi. Are there any bugs in the code for Yogi? Do you have any idea how to deal with it? |
@SISICHEN565 - yogi_eta: 0.01
- yogi_tau: 0.001
- yogi_beta: 0.01
- yogi_beta2: 0.99 |
What happened + What you expected to happen
@fanlai0990 @AmberLJC
I have a suspicion that in this line where the executor receives the model update from aggregator and just about to start testing, the yogi optimizer is executed, which is unnecessary. I think the optimizer should only be executed in the aggregator at the end of the round. However, here in executor, the optimizer is executed every time when there is a model_train or model_test event.
FedScale/fedscale/cloud/execution/executor.py
Line 187 in e62ad70
Why in previous version fed-yogi works? I had a suspicion that in some config files, such as
FedScale/benchmark/configs/openimage/openimage.yml
Line 54 in e62ad70
it's written as "yogi" instead of "fed-yogi". As a result, in optimizers.py, the "real" optimizer is still fed-avg as there is no if statement for "yogi". So the fed-yogi bug is not exposed.
FedScale/fedscale/cloud/aggregation/optimizers.py
Line 82 in e62ad70
In summary, I think there is still bug in executor set_weight. It works for fed-avg, but not for other optimizers. Let me know what you think.
Versions / Dependencies
#242
Reproduction script
fedscale driver start benchmark/configs/femnist/conf.yml
Issue Severity
None
The text was updated successfully, but these errors were encountered: