Question about the action Token and image augmentation #8

Tschian · 2024-04-11T22:26:51Z

action_token_out = transformer_out[:, :, 0, :].

Hello, i don't know why directly take the first dimension of the output as the action_token_out. After your grouping, the grouped input should follow this order: spatial_context_feature + region_feature + action_token + other obs feature. Would the dimension be changed when they pass through the transformer_decoder?

In addition, about the image augmentation (padding + random_crop), how many crops did you take? I saw around the code, only take the default value: num_crops=1. Doesn't the global feature really get lost if there is only one? Because i saw your code, the feature map is extracted from the cropped image.

Could you help me figure out why and how? Thanks a lot

zhuyifengzju · 2024-04-14T14:41:47Z

Hi,

Hello, i don't know why directly take the first dimension of the output as the action_token_out. After your grouping, the grouped input should follow this order: spatial_context_feature + region_feature + action_token + other obs feature. Would the dimension be changed when they pass through the transformer_decoder?

VIOLA/viola_bc/modules.py

Line 2194 in 1e8b5ae

    
           self.tensor_list = [self.action_token.unsqueeze(0).expand(batch_size, -1).unsqueeze(1),

you can see that action token is always assumed to be the first one.

In addition, about the image augmentation (padding + random_crop), how many crops did you take? I saw around the code, only take the default value: num_crops=1. Doesn't the global feature really get lost if there is only one? Because i saw your code, the feature map is extracted from the cropped image.

The random cropping is just for shifting pixels by 4 or 8 (I forgot the exact numbers). So the cropped image should contain most of the information even after random cropping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the action Token and image augmentation #8

Question about the action Token and image augmentation #8

Tschian commented Apr 11, 2024

zhuyifengzju commented Apr 14, 2024

Question about the action Token and image augmentation #8

Question about the action Token and image augmentation #8

Comments

Tschian commented Apr 11, 2024

zhuyifengzju commented Apr 14, 2024