Skip to content

Latest commit

 

History

History
121 lines (106 loc) · 6.6 KB

README.md

File metadata and controls

121 lines (106 loc) · 6.6 KB

MSRVTT-Personalization

Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov

arXiv Project Page

In this paper, we introduce MSRVTT-Personalization, a new benchmark for the task of personalization. It aims at accurate subject fidelity assessment and supports various conditioning modes, including conditioning on face crops, single or multiple arbitrary subjects, and the combination of foreground objects and background.

We include the testing dataset and evaluation protocol in this repository. We show a test sample of MSRVTT-Personalization below:

Ground Truth Video Personalization Annotations
**We will remove video samples from Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.

Leaderboard

  • MSRVTT-Personalization evaluates a model across five metrics:

    • Text similarity (Text-S)
    • Video similarity (Vid-S)
    • Subject similarity (Subj-S)
    • Face similarity (Face-S)
    • Dynamic degree (Dync-D)
  • Quantitative evaluation:

    • Subject mode of MSRVTT-Personalization (condition on an entire subject image)

      Method Text-S Vid-S Subj-S Dync-D
      ELITE 0.245 0.620 0.359 -
      VideoBooth 0.222 0.612 0.395 0.448
      DreamVideo 0.261 0.611 0.310 0.311
      Video Alchemist 0.269 0.732 0.617 0.466
    • Face mode of MSRVTT-Personalization (condition on a face crop image)

      Method Text-S Vid-S Face-S Dync-D
      IP-Adapter 0.251 0.648 0.269 -
      PhotoMaker 0.278 0.569 0.189 -
      Magic-Me 0.251 0.602 0.135 0.418
      Video Alchemist 0.273 0.687 0.382 0.424
  • Qualitative evaluation:

Evaluation Protocol

To add

Citation

If you find this project useful for your research, please cite our paper. 😊

@inproceedings{chen2025videoalchemist,
  title   = {Multi-subject Open-set Personalization in Video Generation},
  author  = {Chen, Tsai-Shien and Siarohin, Aliaksandr and Menapace, Willi and Fang, Yuwei and Lee, Kwot Sin and Skorokhodov, Ivan and Aberman, Kfir and Zhu, Jun-Yan and Yang, Ming-Hsuan and Tulyakov, Sergey},
  journal = {arXiv preprint arXiv:2501.06187},
  year    = {2025}
}

Contact Information

Tsai-Shien Chen: [email protected]