Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About window_size in MOAT #146

Open
edwardyehuang opened this issue Nov 1, 2022 · 3 comments
Open

About window_size in MOAT #146

edwardyehuang opened this issue Nov 1, 2022 · 3 comments

Comments

@edwardyehuang
Copy link

I found window_size is None in MOAT

def get_model(
name: str,
input_shape: list[int],
window_size: Optional[list[list[int]]] = None,
survival_rate: Optional[float] = None,
pool_size: Optional[int] = 3,
override_config: Optional[dict[str, Any]] = None,
pretrained_weights_path: Optional[str] = None,
remove_position_embedding: Optional[bool] = None,
return_config: Optional[bool] = False,
strict_loading: Optional[bool] = False,
use_checkpointing_for_attention: Optional[bool] = False,
global_attention_at_end_of_moat_stage: Optional[bool] = False,
) -> ...:

config.window_size = window_size

Is only global attention used for segmentation tasks?

@Chenglin-Yang
Copy link

Thanks for your interest!

Please see

-window_size: A list of two integers, spatial size of input for window
for how to specify the desired window size for the use case.

Our setting can be found in the experimental sections on paper, but I can provide the information here:
For COCO object detection, we use window based attention for the third stage with size 14x14 and global attention for the fourth stage.
For ADE20K semantic segmentation, we use global attention for both third and fourth stages.

@edwardyehuang
Copy link
Author

Thanks for your interest!

Please see

-window_size: A list of two integers, spatial size of input for window

for how to specify the desired window size for the use case.
Our setting can be found in the experimental sections on paper, but I can provide the information here: For COCO object detection, we use window based attention for the third stage with size 14x14 and global attention for the fourth stage. For ADE20K semantic segmentation, we use global attention for both third and fourth stages.

Thanks for your point out.

I also noticed the implementation of the global window is flawed.

When using the global window size, the current implementation will still record a fixed window size, depending on the input size in the build stage. Therefore, if the given input size is different from the recorded size, the global will be limited or direct raised error (e.g., smaller input size than recorded window size)

@Chenglin-Yang
Copy link

Thank you for finding this typo.

If you want to evaluate the model with an input size that is different from your training phase, you will need to create another model that is built with that input size and loads the weights. This is how the current tensorflow model works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants