Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of More Pooling Methods #2048

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

billpsomas
Copy link

Hi there!

I am the author of the ICCV 2023 paper titled "Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?", which focuses on benchmarking pooling techniques for CNNs and Transformers. It also introduces a new, simple, attention-based pooling mechanism with great localization properties.

In this pull request, I have implemented and rigorously tested the following pooling methods:

  • Generalized Max Pooling
  • LSE Pooling
  • HOW Pooling
  • Slot Attention (Pooling)
  • SimPool
  • ViT Pooling

I believe these additions will be beneficial to the library, offering users cutting-edge options for pooling in their models. These methods have shown promising results in my research and experiments, and I am excited about their potential impact on a wider range of applications.

I am looking forward to your feedback and am happy to make any further adjustments as needed.
Thank you for considering this contribution to the library.

Cheers :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rwightman
Copy link
Collaborator

@billpsomas not ignoring this, just a lot to digest and consider re integrating and testing such pooling layers in a sensible fashion...

Without diving into the details and mechanics of integration and testing challenges. Some nits on style and attribution. I recognize some of the styles here, I see some lucidrains, I see some code that looks very Microsoft-like, and others. I'd like to have attribution as to where bits and pieces here came from in the file / class level docstrings...

And then, would like to have the style unified, so make the lucid to_q etc -> q, the MS-like query_proj -> q (or merge into qkv if they are all same dim), etc.

Some of the above might be bit of work so don't jump in right away. It looks like you've integrated these layers into the pooling factory in your own timm fork, does that work well? Have they all been tested? Any comparative results combining these layers with classic CNN like resnets, or vit / vit-hybrids?

@fffffgggg54 fffffgggg54 mentioned this pull request Dec 18, 2023
@billpsomas
Copy link
Author

Hello Ross and sorry for the late reply,

I've indeed integrated everything into my own timm fork and have made a lot of experiments for my paper. You can find experimental results using ResNet-18 in Figure 3 of my paper (https://arxiv.org/pdf/2309.06891.pdf). I've also tested some of them with ResNet-50 and ViT-S. For me, everything worked well. In Figure 3 you will notice that even more poolings are included. I have also integrated these into my timm fork, but did not add them here. Maybe in another PR.

Now, about attribution as to where the code came from:

GMP: https://github.com/VChristlein/dgmp/blob/master/dgmp.py
LSE: custom implementation
HOW: https://github.com/gtolias/how/tree/master/how/layers
Slot Attention: modified https://github.com/evelinehong/slot-attention-pytorch/blob/master/model.py
SimPool: custom implementation
ViT: https://github.com/sooftware/speech-transformer/blob/master/speech_transformer/attention.py

I know this is not priority, but it would be nice to have some extra poolings in the library.

In my case, I modified https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/adaptive_avgmax_pool.py#L124-L161, so that you can give the pooling of your choice through the pool_type argument.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants