Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about backbone #3

Open
MLDeS opened this issue Aug 16, 2023 · 2 comments
Open

Question about backbone #3

MLDeS opened this issue Aug 16, 2023 · 2 comments

Comments

@MLDeS
Copy link

MLDeS commented Aug 16, 2023

Hello,

Congratulations on the great work. I have some questions on the backbone used.

  1. Was it pretrained and frozen for feature extraction? Or was it fine-tuned in a supervised fashion on the self-supervised pretrained features (looks like the latter)?
  2. If fine-tuned, did you unfreeze all layers or few layers only? Did you do an ablation on how many layers to unfreeze?
  3. Did you try to use the frozen features and see how it yields with respect to localization? It would be helpful if you could throw some light on these?
  4. Why did you choose SimMIM, for e.g., why not MAE? Did you try and find out SimMIM works better?

I am sorry if I ask any redundant question. It would be helpful to have some insights into these aspects.

Thanks a lot, again!

@PkuRainBow
Copy link

We will release the source code soon.

@impiga
Copy link
Owner

impiga commented Nov 16, 2023

@MLDeS

  1. The backbone is fine-tuned.

  2. We unfreeze all layers by default and do not try to freeze some layers.

    Nonetheless, we employ a learning rate decay strategy for mask-image-modeling (MIM) pre-trained models, a technique commonly used when fine-tuning MIM models. This strategy assigns a smaller learning rate to the shallower layers and a larger learning rate to the deeper ones, following the formula lr = base_lr * decay_rate ** (num_layers - layer_depth), where the decay_rate is less than or equal to 1.

    By adjusting the decay_rate, we can potentially achieve an effect similar to freezing some layers.

  3. We have not yet evaluated the performance of frozen features within the DETR framework.

    In a previous study(paper), we examined the use of frozen features for downstream dense tasks and compared different pre-training methods. We discovered that the performance of MIM frozen features was subpar, but this could be a result of poor classification. We would evaluate their localization performance later.

image

  1. We use Swin Transformer as backbone and SimMIM provides pre-trained Swin Transformer checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants