Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating overestimation of total parameter counts for multiple models #322

Open
6DammK9 opened this issue Sep 25, 2024 · 1 comment

Comments

@6DammK9
Copy link

6DammK9 commented Sep 25, 2024

Describe the bug
Total params of the model may be overestmated up to 2.37x for multiple models, meanwhile other models remains accurate.

I wonder if there is something common across these models, yielding some double count and hence the overestimated result.

To Reproduce

  • Run this code snippet directly. Ref
from ultralytics import YOLO
from torchinfo import summary

# Load the yolov10n model
model = YOLO("yolov10n.pt")

# It is a pytorch model, so we count it directrly.
pytorch_total_params = sum(p.numel() for p in model.parameters())

# Pass model will trigger model.train which is not good.
model_summary = summary(model.model, 
    #input_data="path/to/bus.jpg",
    input_size=(1,3,640,640), 
    col_names=("input_size", "output_size", "num_params")
)

with open('summary.txt', 'w', encoding='utf-8') as the_file:
    the_file.write(f"{str(model_summary)}\r\nTotal Params (torch): {str(pytorch_total_params)}\r\nTotal Params (info): {model.info()}")
  • My ipynb would be a lot more sophisticated, including intercepting the input data within the pipeline. The counts should remains the same. The inaccurate result will be kind of 2.0b vs 860M (SD1), 2.1b vs 865M (SD2), and 5.3b vs 2.6b (SDXL).

Expected behavior
Viewing the generated summary.txt will see the inconsistent result, which is overestimated for 1.78x. Official claims 2.3M but model.info() used the same nn.numel() approach which gives 2775520 also.

...
│    │    └─DFL: 3-137                                       [1, 64, 8400]             [1, 4, 8400]              (16)
=======================================================================================================================================
Total params: 4,932,416
Trainable params: 0
Non-trainable params: 4,932,416
Total mult-adds (Units.GIGABYTES): 4.29
=======================================================================================================================================
Input size (MB): 4.92
Forward/backward pass size (MB): 362.66
Params size (MB): 11.10
Estimated Total Size (MB): 378.68
=======================================================================================================================================

Total Params (torch): 2775520

Total Params (info): (385, 2775520, 0, 8.7404288)

Runtime environment

  • Python 3.10 under conda
torch==2.4.0+cu124
diffusers==0.30.0
transformers==4.44.0
@TylerYep
Copy link
Owner

Hmm, perhaps recursive layers? Not sure what this could be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants