Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the summary if the model output consists of int and str? #280

Open
simon5u opened this issue Oct 17, 2023 · 1 comment
Open

Comments

@simon5u
Copy link

simon5u commented Oct 17, 2023

Describe the bug
torchinfo.py", line 448, in traverse_input_data
result = aggregate(
TypeError: unsupported operand type(s) for +: 'int' and 'str'

It seems like the torchinfo.py cannot mix the different model outputs.

    elif isinstance(data, Iterable) and not isinstance(data, str):
        aggregate = aggregate_fn(data)
        result = aggregate(
            [traverse_input_data(d, action_fn, aggregate_fn) for d in data]
        )

To Reproduce
Steps to reproduce the behavior:

  1. Install the lavis model from https://github.com/salesforce/LAVIS
    salesforce-lavis 1.0.0
    transformers 4.25.0
  2. Run the following code to get the summary:-
import torch
from PIL import Image

# load sample image
raw_image = Image.open("docs/_static/merlion.png").convert("RGB")

import torch
from lavis.models import load_model_and_preprocess
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# loads BLIP caption base model, with finetuned checkpoints on MSCOCO captioning dataset.
# this also loads the associated image processors
model, vis_processors, _ = load_model_and_preprocess(name="blip_caption", model_type="base_coco", is_eval=True, device=device)

# preprocess the image
# vis_processors stores image transforms for "train" and "eval" (validation / testing / inference)
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)

# generate caption
output = model.generate({"image": image})
# ['a large fountain spewing water into the air']

from torchinfo import summary
text_input = ["a large statue of a person spraying water from a fountain"]
samples = {"image": image, "text_input": text_input}
summary(model, input_data=[{"image": image, "text_input": text_input}])

Expected behavior
To produce the model summary

@JDRanpariya
Copy link

I'm having the same issue, is there any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants