-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving and loading the trained model after the end of a federated experiment #1139
Comments
Hi @enrico310786 ! Short answer:
Then save it in its native torch format (i.e. Long answer: |
Hi https://github.com/kta-intel, thank you for your answer. About the long answer, I try to update my know how to the Task Runner API or the Workflow API. About the short answer, i noted the following facts:
and the ModelInterface with
I also defined an initial_model object as tensor([[-0.0021, 0.1488, -0.2283, ..., -0.0838, -0.0545, -0.2650],
and applying the fl_experiment.get_last_model() on the test set of one envoy, I obtain good result (low MSE and high R^2). But I thing that fl_experiment.get_last_model() is the latest model at the final run not the best one. Why does fl_experiment.get_best_model() give me the initial_model weights and not those of the best one? Thanks again, |
Thanks for the investigation. Hmm, this might be a bug (or at the very least, insufficient checkpointing logic). My suspicion is that the the Interactive API backend is using a loss criteria on the validation step to check for the best model, but since the validate function is measuring for accuracy, it is marking the higher value as worse and not saving it. On the other hand, as you said, This is actually more of an educated guess based on similar issues in the past. I have to dive a bit deeper to confirm, though. |
Hi @enrico310786, as @kta-intel mentioned earlier, we are in the process of deprecating Interactive API. Is the issue also reproducible with Task Runner API, or Workflow API? |
Hi @teoparvanov, I don't know. So far I just used the Interactive API framework. |
@enrico310786 , here are a couple of resources to get you started with the main OpenFL API-s:
Please keep us posted on how this is going. The Slack #onboarding channel is a good place to get additional support. |
Hi,
I made some experiments of federated learning using the tutorial PyTorch_TinyImageNet at this link: https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/interactive_api/PyTorch_TinyImageNet
Everithing goes right. I have one director and two envoys. The directory is in one server and the two envoys are in two different server. During the training i see the accuracy is growing.
My question are:
I noted the in the workspace folder, once the experimnet is ended, there is a file called "model_obj.pkl". I load the file
path_model_pkl = "model_obj.pkl"
with open(path_model_pkl, 'rb') as f:
model_interface = pickle.load(f)
model = model_interface.model
but, if i apply this model to the images of the test set of one of the two envoy, i do not obtain the good results compatible with a trained model. So, i think that this is not the best trained model. Where is it stored at the end of the experiment?
Thanks
The text was updated successfully, but these errors were encountered: