Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart training from checkpoints #52

Open
mixmixmix opened this issue Jun 13, 2023 · 1 comment
Open

Restart training from checkpoints #52

mixmixmix opened this issue Jun 13, 2023 · 1 comment
Assignees

Comments

@mixmixmix
Copy link
Collaborator

With semi-frequent eculid restarts it would be good if we can restart training from a checkpoint.
I think it used to work, and maybe I have deactivated it at some point. It looks like checkpoints are saved but not used when starting training.

@ctorney could you take a look? Branch master.

@mixmixmix
Copy link
Collaborator Author

mixmixmix commented Jun 15, 2023

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint

  • save weights only needs to be true,
  • the path to the cehckpoints must be different so it saves in hd5 format
  • seems that it needs to be loaded manually, save_weights_on_restart is now deprecated

@mixmixmix mixmixmix assigned mixmixmix and unassigned ctorney Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants