New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning #8

Open

Dongwoo-Im opened this issue Nov 29, 2023 · 0 comments

Labels

Owner

Dongwoo-Im commented Nov 29, 2023 •

edited

Loading

github : https://github.com/bair-climate-initiative/scale-mae

[Motivation]

위성 도메인에서는 image sacle이 다양하기에, image 상 거리와 실제 거리가 상이함
scale unware training에서는 아무리 많이 학습하더라도 unseen case에 대한 일반화 성능을 보장하기 어려움

[Main factor]

GSDPE (Ground Sample Distance Positional Encoding) : position and scale 이해 가능
Laplacian-pyramid decoder : multi-scale represenation 학습 가능
MAE variants 중 scale-aware 특성과 laplacian pyramid 사용한 경우는 본인들이 처음이라고 주장

[Main Figure]

[GSDPE]

original PE에 g/G term이 추가되어 이를 통해 scale-aware 가능
MAE encoder 들어가기 전, Demask 과정 총 두번에 걸쳐 주입됨
- 관련 insight, ablation 찾아보기

[Image]

$I_{hr}$ = initial higher resolution image : random crop (448x448) from orignal image
$I$ = input image : $I_{hr}$ downsample to 224x224
high-freq GT : $I_{hr}$ downsample to 56x56 and upsample to 448x448 and subtract from $I_{hr}$
- capture object edges, roads, and building outlines
low-freq GT : $I_{hr}$ downsample to 14x14 and upsample to 224x224
- capture color gradients and landscapes

[Decoder]

decoding : standard MAE decoder (8 layers -> 3 layers)
upsampling : upsample x2 and x4, passed to laplacian blocks
reconstruction : laplacian blocks (feature mapping, upsample, reconstruction) with L1 loss and L2 loss

[Evaluation]

Dongwoo-Im added MIM ICCV labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment