New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

ALIT: Adaptive Length Image Tokenization via Recurrent Allocation #14

Open

Dongwoo-Im opened this issue Nov 28, 2024 · 0 comments

Owner

Dongwoo-Im commented Nov 28, 2024

arxiv: https://arxiv.org/abs/2411.02393
github: https://github.com/ShivamDuggal4/adaptive-length-tokenizer

Evaluation 일환으로 depth estimation / image captioning (using GPT4)을 활용한게 인상적

Encode 과정에서 2D 정보를 1D latent에 담기도록 하고,
Decode 하기 전에 1D latent + mask objective (SSL) 통해 2D token을 recon한다.

후속 연구: visual alignment (decomposition)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment