Skip to content

Latest commit

 

History

History
78 lines (63 loc) · 6.29 KB

cvpr-2024.md

File metadata and controls

78 lines (63 loc) · 6.29 KB

CVPR 2024

Meta Info

Homepage: https://cvpr.thecvf.com/Conferences/2024

Paper list: https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers

Papers

Diffusion Models

Acceleration

  • Cache Me if You Can: Accelerating Diffusion Models through Block Caching [Paper] [Homepage]
    • Meta & TUM & MCML & Oxford
    • Block caching
      • Reuse outputs from layer blocks of previous steps to speed up inference.
      • Automatically determines caching schedules based on each block's changes over timesteps.
  • CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model [Paper] [Code]
    • TJU & Tencent
    • CAT-DM: Controllable Accelerated virtual Try-on with Diffusion Model
    • Initiate a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model → Reduce the sampling steps
  • DeepCache: Accelerating Diffusion Models for Free [Paper] [Code]
    • NUS
    • Utilize the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
    • Cache and retrieve features across adjacent denoising stages, thereby reducing redundant computations.
  • DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [Paper] [Homepage] [Code]
    • MIT & Princeton & Lepton AI & NVIDIA
    • Displaced patch parallelism
      • Split the model input into multiple patches and assign each patch to a GPU.
      • Reuse the pre-computed feature maps from the previous timestep to provide context for the current step.
    • DistriFusion → Enable running diffusion models across multiple GPUs in parallel
  • SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [Paper]
    • VinAI Research, Vietnam
    • Knowledge distillation: Distill a pre-trained multi-step text-to-image model to a student network that can generate images with just a single inference step.

Support compatibility of add-on modules (ControlNets and LoRAs)

  • X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model [Paper] [Homepage] [Code]
    • NUS & Tencent & FDU
    • Enable the pre-trained add-on modules (ControlNet, LoRA) with the upgraded diffusion model (SDXL) without further retraining.

Support arbitrary image size

  • ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation [Paper] [Homepage] [Code]
    • Rice University
    • Enable pre-trained text-to-image diffusion models to generate images with various sizes.
    • Decouple the generation trajectory of a pre-trained model into local and global signals.
      • The local signal controls low-level pixel information and can be estimated on local patches.
      • The global signal is used to maintain overall structural consistency and is estimated with a reference image.

Improve image quality

  • FreeU: Free Lunch in Diffusion U-Net [Paper] [Homepage] [Code]
    • NTU
    • Key insight
      • Use two modulation factors to re-weight the feature contributions from the U-Net’s skip connections and backbone.
      • Increasing the backbone scaling factor b significantly enhances image quality.
      • Directly scaling s in the skip features has a limited influence on image synthesis quality.
    • FreeU
      • Improve the generation quality with only a few lines of code.
      • Only need to adjust two scaling factors during the inference.

Scalability

  • On the Scalability of Diffusion-based Text-to-Image Generation [Paper]
    • AWS AI Labs & Amazon AGI
    • An empirical study of the scaling properties of diffusion-based text-to-image models.
    • Perform ablations on scaling both denoising backbones and training set, including training scaled U-Net and Transformer variants ranging from 0.4B to 4B parameters on datasets up to 600M images.
    • Specifically
      • Model scaling
        • The location and amount of cross-attention distinguish the performance.
        • To improve text-image alignment, increasing the transformer blocks is more parameter-efficient than increasing channel numbers.
        • Identify an efficient UNet variant.
      • Data scaling
        • The quality and diversity of the training set matter more than simply dataset size.
      • Provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute, and dataset size.