-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d2c3d64
commit 453988c
Showing
1 changed file
with
31 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,31 @@ | ||
# AVLink | ||
# AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | ||
[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://snap-research.github.io/AVLink/) | ||
[![arXiv](https://img.shields.io/badge/arXiv-2311.18822-b31b1b)](#TODO) | ||
|
||
<video src="assets/teaser.mp4" controls> | ||
Your browser does not support the video tag. | ||
</video> | ||
|
||
|
||
# AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | ||
<div align="justify"> | ||
<b>Abstract</b>: We propose AV-Link, a unified framework for Video-to-Audio and Audio-to-Video generation that leverages the activations of frozen video and audio diffusion models for temporally-aligned cross-modal conditioning. The key to our framework is a Fusion Block that enables bidirectional | ||
information exchange between our backbone video and audio diffusion models through a temporally-aligned self attention operation. Unlike prior work that uses feature extractors pretrained for other tasks for the conditioning signal, AV-Link can directly leverage features obtained by the | ||
complementary modality in a single framework i.e. video features to generate audio, or audio features to generate | ||
video. We extensively evaluate our design choices and demonstrate the ability of our method to achieve synchronized and high-quality audiovisual content, showcasing its potential for applications in immersive media generation. For more details, please visit our <a href='https://snap-research.github.io/AVLink/'>project webpage</a> or read our | ||
<a href='#TODO'>paper</a>. | ||
</div> | ||
<br> | ||
|
||
|
||
# Issues | ||
If you have any questions about AV-Link, please open an issue in this GitHub page or send your questions to `[email protected]` | ||
|
||
# Project Page Template | ||
a template of our project page can be found under `docs` directory | ||
|
||
## Citation | ||
If you find this paper useful in your research, please consider citing: | ||
``` | ||
TODO | ||
``` |