Need to interpolate positional embedding to work at higher resolutions #39

atonderski · 2021-11-30T09:52:56Z

Hi again, sorry for the slow response in issue #26. I have some more clarifications and visualizations here.

I agree that the sine-cosine embeddings are not learnable. However it seems like they still need to be interpolated for the model to work well. I suspect that this is at least partially due to the fact that they are 1d, and thus the model has to learn the number of rows/columns. E.g. it cannot express "look one patch down" directly, but rather needs to express it as "look X patches forward". And X changes if we change resolution.

I have attached attention visualizations that show what happens if you run on higher res with or without interpolating the positional embedding. As you can see, the non-interpolated version looks much worse and has weird diagonal stripes.

This is not a major issue to me, but I wanted to let you (and anyone else that has the same problem) know about this. I think the best solution is what I mentioned before: to simply include the positional embeddings in the checkpoint even though they are not learnable parameters.

Original:

With interpolation:

Without interpolation:

pengzhiliang added the enhancement New feature or request label Dec 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to interpolate positional embedding to work at higher resolutions #39

Need to interpolate positional embedding to work at higher resolutions #39

atonderski commented Nov 30, 2021

Need to interpolate positional embedding to work at higher resolutions #39

Need to interpolate positional embedding to work at higher resolutions #39

Comments

atonderski commented Nov 30, 2021