Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALIT: Adaptive Length Image Tokenization via Recurrent Allocation #14

Open
Dongwoo-Im opened this issue Nov 28, 2024 · 0 comments
Open

Comments

@Dongwoo-Im
Copy link
Owner

arxiv: https://arxiv.org/abs/2411.02393
github: https://github.com/ShivamDuggal4/adaptive-length-tokenizer


image

Evaluation 일환으로 depth estimation / image captioning (using GPT4)을 활용한게 인상적

image

Encode 과정에서 2D 정보를 1D latent에 담기도록 하고,
Decode 하기 전에 1D latent + mask objective (SSL) 통해 2D token을 recon한다.

image

후속 연구: visual alignment (decomposition)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant