🚀 CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Xinchao Wang
Learning and Vision Lab, National University of Singapore
🥯[Paper]🎄[Project Page]

1.7x Speedup and 0.5x memory consumption on ImageNet-256 generation. Top: original VAR-d30; Bottom: CoDe N=8. Speed measurement does not include vae decoder

💡 Introduction

We propose Collaborative Decoding (CoDe), a novel decoding strategy tailored for the VAR framework. CoDe capitalizes on two critical observations: the substantially reduced parameter demands at larger scales and the exclusive generation patterns across different scales. Based on these insights, we partition the multi-scale inference process into a seamless collaboration between a large model and a small model.This collaboration yields remarkable efficiency with minimal impact on quality: CoDe achieves a 1.7x speedup, slashes memory usage by around 50%, and preserves image quality with only a negligible FID increase from 1.95 to 1.98. When drafting steps are further decreased, CoDe can achieve an impressive 2.9x acceleration, reaching over 41 images/s at 256x256 resolution on a single NVIDIA 4090 GPU, while preserving a commendable FID of 2.27.

🔥Updates

🔥 November 28, 2024: Our paper is available now!
🔥 November 27, 2024: Our model weights are avalible at 🤗 huggingface here
🔥 November 27, 2024: Code repo is released! Arxiv paper will come soon!

🔧 Installation

Install torch>=2.0.0.
Install other pip packages via pip3 install -r requirements.txt.

💻 Model Zoo

We provide drafter VAR models and refiner VAR models, which are on or can be downloaded from the following links:

Draft step	Refine step	reso.	FID	IS	Drafter VAR🤗	Refiner VAR🤗
9 steps	1 steps	256	1.94	296	drafter_9.pth	refiner_9.pth
8 steps	2 steps	256	1.98	302	drafter_8.pth	refiner_8.pth
7 steps	3 steps	256	2.11	303	drafter_7.pth	refiner_7.pth
6 steps	4 steps	256	2.27	397	drafter_6.pth	refiner_6.pth

Note: The VQVAE vae_ch160v4096z32.pth is also needed.

⚡ Inference

Original VAR Inference:

CUDA_VISIBLE_DEVICES=0 python infer_original.py --model_depth 30

🚀 Training-free CoDe:

CUDA_VISIBLE_DEVICES=0 python infer_CoDe.py --drafter_depth 30 --refiner_depth 16 --draft_steps 8 --training_free

🚀 Speciliazed Fine-tuned CoDe:

CUDA_VISIBLE_DEVICES=0 python infer_CoDe.py --drafter_depth 30 --refiner_depth 16 --draft_steps 8

drafter_depth: The depth of the large drafter transformer model.
refiner_depth: The depth of the small refiner transformer model.
draft_steps: Number of steps for the drafting stage.
training_free: Enabling training-free CoDe or inference with specialized finetuned CoDe.

⚡ Sample & Evaluations

Sampling 50000 images (50 per class) with CoDe

CUDA_VISIBLE_DEVICES=0 python sample_CoDe.py --drafter_depth 30 --refiner_depth 16 --draft_steps 8 --output_path <img_save_path>

The generated images are saved as both .PNG and .npz. Then use the OpenAI's FID evaluation toolkit and reference ground truth npz file of 256x256 to evaluate FID, IS, precision, and recall.

🚀 Visualization Results

Quanlitative Results

Zero-short Inpainting&Editing (N=8)

Acknowlegdement

Thanks to VAR for their wonderful work and codebase!

Citation

If our research assists your work, please give us a star ⭐ or cite us using:

@misc{2411.17787,
Author = {Zigeng Chen and Xinyin Ma and Gongfan Fang and Xinchao Wang},
Title = {Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient},
Year = {2024},
Eprint = {arXiv:2411.17787},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dist.py		dist.py
infer_CoDe.py		infer_CoDe.py
infer_original.py		infer_original.py
requirements.txt		requirements.txt
sample_CoDe.py		sample_CoDe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

💡 Introduction

🔥Updates

🔧 Installation

💻 Model Zoo

⚡ Inference

Original VAR Inference:

🚀 Training-free CoDe:

🚀 Speciliazed Fine-tuned CoDe:

⚡ Sample & Evaluations

Sampling 50000 images (50 per class) with CoDe

🚀 Visualization Results

Quanlitative Results

Zero-short Inpainting&Editing (N=8)

Acknowlegdement

Citation

About

Releases

Packages

Languages

License

czg1225/CoDe

Folders and files

Latest commit

History

Repository files navigation

🚀 CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

💡 Introduction

🔥Updates

🔧 Installation

💻 Model Zoo

⚡ Inference

Original VAR Inference:

🚀 Training-free CoDe:

🚀 Speciliazed Fine-tuned CoDe:

⚡ Sample & Evaluations

Sampling 50000 images (50 per class) with CoDe

🚀 Visualization Results

Quanlitative Results

Zero-short Inpainting&Editing (N=8)

Acknowlegdement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages