Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in resuming training from where it stopped. #74

Open
arpitakabra opened this issue May 21, 2020 · 1 comment
Open

Error in resuming training from where it stopped. #74

arpitakabra opened this issue May 21, 2020 · 1 comment

Comments

@arpitakabra
Copy link

I tried to resume the training from where it stopped by changing the restore_path variable in the config.py to ./checkpoint/
But it showed some error as:
Traceback (most recent call last): File "train.py", line 91, in <module> train() File "train.py", line 79, in train launch_train_with_config(traincfg, trainer) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config extra_callbacks=config.extra_callbacks) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults steps_per_epoch, starting_epoch, max_epoch) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train self.initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize super(TowerTrainer, self).initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 226, in initialize session_init._setup_graph() File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 110, in _setup_graph dic = self._get_restore_dict() File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 159, in _get_restore_dict self._match_vars(f) File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 126, in _match_vars reader, chkpt_vars = SaverRestore._read_checkpoint_vars(self.path) File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sessinit.py", line 120, in _read_checkpoint_vars reader = tf.train.NewCheckpointReader(model_path) File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/pywrap_tensorflow_internal.py", line 873, in NewCheckpointReader return CheckpointReader(compat.as_bytes(filepattern)) File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/pywrap_tensorflow_internal.py", line 885, in _init_ this = _pywrap_tensorflow_internal.new_CheckpointReader(filename) tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./checkpoint/

I also tried restore_path = ./checkpoint/model-500.ckpt, but get the error that the given checkpoint file doesn't exist.

It would be great if you can help with this.

Thanks!

@etatbak
Copy link

etatbak commented May 25, 2020

I am also getting error when I try to continue training:

config file:
restore_path = './checkpoint'
checkpoint_path = '/model-500000'

I am getting this error:
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file .\checkpoint: Unknown: NewRandomAccessFile failed to Create/Open: .\checkpoint2 : Access is denied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants