Homepage: https://cvpr.thecvf.com/Conferences/2024
Paper list: https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers
- Cache Me if You Can: Accelerating Diffusion Models through Block Caching [Paper] [Homepage]
- Meta & TUM & MCML & Oxford
- Block caching
- Reuse outputs from layer blocks of previous steps to speed up inference.
- Automatically determines caching schedules based on each block's changes over timesteps.
- CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model [Paper] [Code]
- TJU & Tencent
- CAT-DM: Controllable Accelerated virtual Try-on with Diffusion Model
- Initiate a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model → Reduce the sampling steps
- DeepCache: Accelerating Diffusion Models for Free [Paper] [Code]
- NUS
- Utilize the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
- Cache and retrieve features across adjacent denoising stages, thereby reducing redundant computations.
- DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [Paper] [Homepage] [Code]
- MIT & Princeton & Lepton AI & NVIDIA
- Displaced patch parallelism
- Split the model input into multiple patches and assign each patch to a GPU.
- Reuse the pre-computed feature maps from the previous timestep to provide context for the current step.
- DistriFusion → Enable running diffusion models across multiple GPUs in parallel
- SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [Paper]
- VinAI Research, Vietnam
- Knowledge distillation: Distill a pre-trained multi-step text-to-image model to a student network that can generate images with just a single inference step.
- X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model [Paper] [Homepage] [Code]
- NUS & Tencent & FDU
- Enable the pre-trained add-on modules (ControlNet, LoRA) with the upgraded diffusion model (SDXL) without further retraining.
- ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation [Paper] [Homepage] [Code]
- Rice University
- Enable pre-trained text-to-image diffusion models to generate images with various sizes.
- Decouple the generation trajectory of a pre-trained model into local and global signals.
- The local signal controls low-level pixel information and can be estimated on local patches.
- The global signal is used to maintain overall structural consistency and is estimated with a reference image.
- FreeU: Free Lunch in Diffusion U-Net [Paper] [Homepage] [Code]
- NTU
- Key insight
- Use two modulation factors to re-weight the feature contributions from the U-Net’s skip connections and backbone.
- Increasing the backbone scaling factor b significantly enhances image quality.
- Directly scaling s in the skip features has a limited influence on image synthesis quality.
- FreeU
- Improve the generation quality with only a few lines of code.
- Only need to adjust two scaling factors during the inference.
- On the Scalability of Diffusion-based Text-to-Image Generation [Paper]
- AWS AI Labs & Amazon AGI
- An empirical study of the scaling properties of diffusion-based text-to-image models.
- Perform ablations on scaling both denoising backbones and training set, including training scaled U-Net and Transformer variants ranging from 0.4B to 4B parameters on datasets up to 600M images.
- Specifically
- Model scaling
- The location and amount of cross-attention distinguish the performance.
- To improve text-image alignment, increasing the transformer blocks is more parameter-efficient than increasing channel numbers.
- Identify an efficient UNet variant.
- Data scaling
- The quality and diversity of the training set matter more than simply dataset size.
- Provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute, and dataset size.
- Model scaling