Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang
(Source: Make-A-Video, SimDA, PYoCo, SVD , Video LDM and Tune-A-Video)
- [News] We are planning to update the survey soon to encompass the latest work. If you have any suggestions, please feel free to contact us.
- [News] The Chinese translation is available on Zhihu. Special thanks to Dai-Wenxun for this.
Methods | Task | Github |
---|---|---|
Open-Sora-Plan | T2V Generation | |
Open-Sora | T2V Generation | |
Morph Studio | T2V Generation | - |
Genie | T2V Generation | - |
Sora | T2V Generation & Editing | - |
VideoPoet | T2V Generation & Editing | - |
Stable Video Diffusion | T2V Generation | |
NeverEnds | T2V Generation | - |
Pika | T2V Generation | - |
EMU-Video | T2V Generation | - |
GEN-2 | T2V Generation & Editing | - |
ModelScope | T2V Generation | |
ZeroScope | T2V Generation | - |
T2V Synthesis Colab | T2V Genetation | |
VideoCraft | T2V Genetation & Editing | |
Diffusers (T2V synthesis) | T2V Genetation | - |
AnimateDiff | Personalized T2V Genetation | |
Text2Video-Zero | T2V Genetation | |
HotShot-XL | T2V Genetation | |
Genmo | T2V Genetation | - |
Fliki | T2V Generation | - |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild | - | - | Dec., 2012 | |
First Order Motion Model for Image Animation | - | - | May, 2023 | |
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks | - | - | CVPR,2018 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Jun., 2024 | |||
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models | - | ICLR, 2024 | ||
Subjective-Aligned Dateset and Metric for Text-to-Video Quality Assessment | - | - | Mar, 2024 | |
Towards A Better Metric for Text-to-Video Generation | - | Jan, 2024 | ||
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI | - | - | Jan, 2024 | |
VBench: Comprehensive Benchmark Suite for Video Generative Models | Nov, 2023 | |||
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation | - | - | NeurIPS, 2023 | |
CVPR 2023 Text Guided Video Editing Competition | - | - | Oct., 2023 | |
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct., 2023 | |||
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset | - | - | Sep., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Jun., 2024 | |||
Context-aware Talking Face Video Generation | - | - | Feb., 2024 | |
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Feb., 2024 | |||
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion | - | - | ICCV, 2023 | |
Generative Disco: Text-to-Video Generation for Music Visualization | - | - | Apr., 2023 | |
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion | - | - | CVPRW, 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties | - | - | Feb., 2024 | |
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity | NeurIPS, 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation | Jul., 2023 | |||
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance | Jun., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation | Feb. 2024 | |||
Video Probabilistic Diffusion Models in Projected Latent Space | CVPR 2023 | |||
VIDM: Video Implicit Diffusion Models | AAAI 2023 | |||
GD-VDM: Generated Depth for better Diffusion-based Video Generation | - | Jun., 2023 | ||
LEO: Generative Latent Image Animator for Human Video Synthesis | May., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Latte: Latent Diffusion Transformer for Video Generation | Jan., 2024 | |||
VDT: An Empirical Study on Video Diffusion with Transformers | - | May, 2023 | ||
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer | May, 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Towards Language-Driven Video Inpainting via Multimodal Large Language Models | Jan., 2024 | |||
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution | - | - | - | WACW, 2023 |
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution | Dec., 2023 | |||
AVID: Any-Length Video Inpainting with Diffusion Model | Dec., 2023 | |||
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution | - | CVPR 2023 | ||
LDMVFI: Video Frame Interpolation with Latent Diffusion Models | - | - | Mar., 2023 | |
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming | - | - | Nov., 2022 | |
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos | - | - | May., 2023 |
Title | arXiv | Github | Website | Pub. & Date |
---|---|---|---|---|
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction | Jun, 2024 | |||
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction | - | Dec, 2023 | ||
Video Diffusion Models with Local-Global Context Guidance | - | IJCAI, 2023 | ||
Seer: Language Instructed Video Prediction with Latent Diffusion Models | - | Mar., 2023 | ||
MaskViT: Masked Visual Pre-Training for Video Prediction | Jun, 2022 | |||
Diffusion Models for Video Prediction and Infilling | TMLR 2022 | |||
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | NeurIPS 2022 | |||
Diffusion Probabilistic Modeling for Video Generation | - | Mar., 2022 | ||
Flexible Diffusion Modeling of Long Videos | May, 2022 | |||
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models | May, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing | Jun, 2024 | |||
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models | - | - | Mar, 2024 | |
Fairy: Fast Parallellized Instruction-Guided Video-to-Video Synthesis | - | Dec, 2023 | ||
Neural Video Fields Editing | Dec, 2023 | |||
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models | Nov, 2023 | |||
Consistent Video-to-Video Transfer Using Synthetic Dataset | - | - | Nov., 2023 | |
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions | - | - | May, 2023 | |
Collaborative Score Distillation for Consistent Visual Synthesis | - | - | July, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation | Nov, 2023 | |||
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction | - | Nov, 2023 | ||
DragVideo: Interactive Drag-style Video Editing | - | Nov, 2023 | ||
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet | - | July, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model | - | - | May., 2023 | |
Soundini: Sound-Guided Diffusion for Natural Video Editing | Apr., 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing | - | Oct., 2023 | ||
INVE: Interactive Neural Video Editing | - | Jul., 2023 | ||
Shape-Aware Text-Driven Layered Video Editing | - | Jan., 2023 |
If you have any suggestions or find our work helpful, feel free to contact us
Homepage: Zhen Xing
Email: [email protected]
If you find our survey is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@article{vdmsurvey,
title={A Survey on Video Diffusion Models},
author={Zhen Xing and Qijun Feng and Haoran Chen and Qi Dai and Han Hu and Hang Xu and Zuxuan Wu and Yu-Gang Jiang},
journal={arXiv preprint arXiv:2310.10647},
year={2023}
}