diff --git a/README.md b/README.md index a03f921..15c0f94 100644 --- a/README.md +++ b/README.md @@ -1 +1,31 @@ -# AVLink +# AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation +[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://snap-research.github.io/AVLink/) +[![arXiv](https://img.shields.io/badge/arXiv-2311.18822-b31b1b)](#TODO) + + + + +# AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation +
+Abstract: We propose AV-Link, a unified framework for Video-to-Audio and Audio-to-Video generation that leverages the activations of frozen video and audio diffusion models for temporally-aligned cross-modal conditioning. The key to our framework is a Fusion Block that enables bidirectional +information exchange between our backbone video and audio diffusion models through a temporally-aligned self attention operation. Unlike prior work that uses feature extractors pretrained for other tasks for the conditioning signal, AV-Link can directly leverage features obtained by the +complementary modality in a single framework i.e. video features to generate audio, or audio features to generate +video. We extensively evaluate our design choices and demonstrate the ability of our method to achieve synchronized and high-quality audiovisual content, showcasing its potential for applications in immersive media generation. For more details, please visit our project webpage or read our +paper. +
+
+ + +# Issues +If you have any questions about AV-Link, please open an issue in this GitHub page or send your questions to `mh155@rice.edu` + +# Project Page Template +a template of our project page can be found under `docs` directory + +## Citation +If you find this paper useful in your research, please consider citing: +``` +TODO +``` \ No newline at end of file