Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while resuming training from saved checkpoint #83

Open
DeepakLabh opened this issue Sep 22, 2023 · 0 comments
Open

Error while resuming training from saved checkpoint #83

DeepakLabh opened this issue Sep 22, 2023 · 0 comments

Comments

@DeepakLabh
Copy link

Passing ckpt_path in lightening's .fit() method gives the below error for the line trainer.fit(forecaster, datamodule=data_module, ckpt_path='best.ckpt.ckpt'). The intent is to resume training from saved checkpoints.

Restoring states from the checkpoint path at best.ckpt.ckpt

==================================================================
| Name | Type | Params

0 | spacetimeformer | Spacetimeformer | 4.5 M

4.5 M Trainable params
0 Non-trainable params
4.5 M Total params
18.191 Total estimated model params size (MB)
Restored all states from the checkpoint file at best.ckpt.ckpt
Epoch 0: 75%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 105/140 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_vol.py", line 457, in
trainer.fit(forecaster, datamodule=data_module, ckpt_path='best.ckpt.ckpt')
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._call_and_handle_interrupt(
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
results = self._run_stage()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
return self._run_train()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
self.fit_loop.run()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 297, in on_advance_end
self.trainer._call_callback_hooks("on_train_epoch_end")
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1637, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 179, in on_train_epoch_end
self._run_early_stopping_check(trainer)
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 190, in _run_early_stopping_check
if trainer.fast_dev_run or not self._validate_condition_metric( # disable early_stopping with fast_dev_run
File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 145, in _validate_condition_metric
raise RuntimeError(error_msg)
RuntimeError: Early stopping conditioned on metric val/loss which is not available. Pass in or modify your EarlyStopping callback to use any of the following: ``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant