You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great analysis! I wonder the attributes of large-kernel CNN. In your paper, the basic 3x3 resnet is fully explored. 3x3 conv extracts detailed local patterns, thus may contribute to the high pass filtering. However, recent works investigate the effect of larger kernel. The attribute of 3x3 resnet might change, and similar to ViT?
The text was updated successfully, but these errors were encountered:
Thank you for your support and insightful question!
In our observation, the attributes of Conv depend primarily on the architecture or the group size (e.g., depthwise-separable Conv) rather than the kernel size. For example, some Conv blocks at the end of stages of ConvNeXt behave like a low-pass filter in terms of Fourier analysis:
This figure provides ∆ log amplitude of ConvNeXt at high-frequency. ConvNext layers at the end of stages reduce high-frequency information, but ResNet layers do not. We leave a detailed investigation for future work.
∆ log amplitude of ResNet-50 at high-frequency
Cf. MSAs generally behave like low-pass filters and reduce feature map variance. Convs, on the contrary, behave like high-pass filters and increase the feature map variance. In terms of feature map variance, all ConvNeXt blocks diversify feature maps (i.e., they always increase the variance).
Great analysis! I wonder the attributes of large-kernel CNN. In your paper, the basic 3x3 resnet is fully explored. 3x3 conv extracts detailed local patterns, thus may contribute to the high pass filtering. However, recent works investigate the effect of larger kernel. The attribute of 3x3 resnet might change, and similar to ViT?
The text was updated successfully, but these errors were encountered: