Skip to content

Everlyn-Labs/Wasserstein-VQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vector Quantization by Distribution Matching

Overview

Vector quantization (VQ) is crucial for effective autoregressive models, especially in visual generative tasks. However, training instability and codebook collapse often limit VQ's full potential. These issues arise due to a mismatch between the feature and code vector distributions, leading to suboptimal utilization of the codebook and significant quantization errors. In this work, we propose a novel approach to Vector Quantization by Distribution Matching, which aligns the feature and code vector distributions using the Wasserstein distance, achieving near 100% codebook utilization and significantly reducing quantization error.

Our method introduces a distributional perspective on VQ, analyzing how better alignment of feature and codebook distributions leads to improved stability and performance. Extensive experiments demonstrate that this approach mitigates training instability and codebook collapse, enhancing downstream tasks like image reconstruction. Our overall framework is as follows:

Setup

To install and run the code, set up the environment as follows:

conda env create -f environment.yml
conda activate vq_distribution_matching
python -m pip install -e .

Implementation

Once the environment is set up, you can run the training process using the provided shell scripts. Here’s an example of how to run the training for the Wasserstein quantizer:

bash train_wasserstein_quantizer_part1.sh
bash train_wasserstein_quantizer_part2.sh

These scripts will train the model using the Wasserstein distance for vector quantization on your selected dataset. The training outputs, including checkpoints, will be stored in the specified directories.

Evaluation

For evaluating the model, use the provided shell scripts. Specifically, you can run the reconstruction evaluation by executing:

bash eval_reconstruction_part1.sh

Results

Method Codebook Size Utilization (%) rFID ↓ LPIPS ↓ PSNR ↑ SSIM ↑
VQGAN (Baseline) 16,384 83.2 3.41 0.14 23.5 56.6
Wasserstein VQ 16,384 100.0 2.28 0.12 24.43 63.5

Acknowledgements

This work builds upon prior research on vector quantization and autoregressive modeling. We gratefully acknowledge the resources and inspiration from the following repositories:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published