Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set. #1446

Closed
9 of 10 tasks
AleksandrMinin opened this issue Apr 5, 2023 · 1 comment
Closed
9 of 10 tasks
Assignees
Labels
bug Something isn't working duplicate This issue or pull request already exists help wanted Extra attention is needed

Comments

@AleksandrMinin
Copy link

AleksandrMinin commented Apr 5, 2023

🐛 Bug Report

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set.

How To Reproduce

Steps to reproduce the behavior:

  1. Create a callback with a BackwardCallback in which grad_clip_fn is not empty.
  2. Launch runner.train with this callback.
  3. The output will be an error:
/python_envs/kaggle-env/lib/python3.8/site-packages/catalyst/callbacks/backward.py:55                                                                                                 
   52 │   │   │   
   53 │   │   │   if self.grad_clip_fn is not None:
   54 │   │   │   │   runner.engine.unscale_gradients()
-->55 │   │   │   │   norm = self.grad_clip_fn(self.model.parameters())
   56 │   │   │   │   if self._log_gradient:
   57 │   │   │   │   │   runner.batch_metrics[f"{self._prefix_gradient}/norm"] = norm
   58                                                                                             

AttributeError: 'BackwardCallback' object has no attribute 'model'

Code sample

import torch
from torch.nn.utils import clip_grad_norm_
from catalyst import dl
from catalyst.core.callback import Callback
from catalyst.engines.torch import CPUEngine, GPUEngine

from src.config import config
from src.base_config import Config
from src.tools import set_global_seed, get_code
from src.dataset import get_loaders
from src.crnn import CRNN
from src.runners import SupervisedOCRRunner

callbacks= [     
    dl.CriterionCallback(
        input_key=dict(output="log_probs", output_size="input_lengths"),
        target_key=dict(target="targets", target_len="target_lengths"),     
        metric_key="loss",
        criterion_key="ctc_loss_fn",
    ),
    dl.BackwardCallback(
        metric_key="loss",
        grad_clip_fn=clip_grad_norm_,
        grad_clip_params={"max_norm": 0.5,
                          "norm_type": 2},   
    ),
]


loaders, infer_loader = get_loaders(config)  
model = CRNN(**config.model_kwargs)

optimizer = config.optimizer(params=model.parameters(), **config.optimizer_kwargs)
scheduler = config.scheduler(optimizer=optimizer, **config.scheduler_kwargs)


if torch.cuda.is_available():
    engine = GPUEngine()
else:
    engine = CPUEngine()

runner = SupervisedOCRRunner(
    input_key="image", 
    target_key="target", 
    output_key="output",
)

criterion = {"ctc_loss_fn": config.ctc_loss}

runner.train(
    model=model,
    engine=engine,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=callbacks,
    num_epochs=config.n_epochs,
    valid_loader="valid",
    valid_metric=config.valid_metric,
    minimize_valid_metric=config.minimize_metric,
    seed=config.seed,
    verbose=True,
    load_best_on_end=True,
)

Expected behavior

You need to replace

norm = self.grad_clip_fn(self.model.parameters()) 

with

norm = self.grad_clip_fn(runner.model.parameters())

in catalyst/callbacks/backward.py line 55.

Then there will be no mistake and the training will be successful.

Environment

Catalyst version: 22.04
PyTorch version: 1.13.0+cu117
Is debug build: No
CUDA used to build PyTorch: 11.7
TensorFlow version: N/A
TensorBoard version: 2.9.1

OS: Ubuntu 20.04.3 LTS
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
CMake version: version 3.10.3

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1080
GPU 1: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.82.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] catalyst==22.4
[pip3] efficientnet-pytorch==0.7.1
[pip3] numpy==1.23.5
[pip3] pytorch-ignite==0.4.11
[pip3] segmentation-models-pytorch==0.3.2
[pip3] tensorboard==2.9.1
[pip3] tensorboard-data-server==0.6.1
[pip3] tensorboard-plugin-wit==1.8.1
[pip3] tensorboardX==2.5.1
[pip3] torch==1.13.0
[pip3] torchvision==0.14.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.5           py39h6c91a56_3  
[conda] numpy-base                1.21.5           py39ha15fc14_3  
[conda] numpydoc                  1.4.0            py39h06a4308_0

Checklist

  • bug description
  • steps to reproduce
  • expected behavior
  • environment
  • code sample / screenshots

FAQ

Please review the FAQ before submitting an issue:

@AleksandrMinin AleksandrMinin added bug Something isn't working help wanted Extra attention is needed labels Apr 5, 2023
@bagxi bagxi added the duplicate This issue or pull request already exists label May 25, 2023
@bagxi
Copy link
Member

bagxi commented May 25, 2023

Duplicate of #1444

@bagxi bagxi marked this as a duplicate of #1444 May 25, 2023
@bagxi bagxi closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants