index.html


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CVPR'24 Tutorial on 3D/4D Generation and Modeling with Generative Priors</title>

  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
  <link href="css/materialize.css" type="text/css" rel="stylesheet" media="screen,projection"/>
  <link href='http://fonts.googleapis.com/css?family=Lato:400,700' rel='stylesheet' type='text/css'>
  <link href="css/style.css" rel="stylesheet" type="text/css" />
</head>

<body id="page-top">
 <div class="navbar-fixed" >
    <nav class="teal lighten-2" role="navigation">
      <div class="nav-wrapper" >
        <ul class="center hide-on-med-and-down  nav navbar-nav navbar-center">
          <li><a class="page-scroll" href="#page-top" style="color:#CD853F;font-size:20px">Home</a></li>
          <li><a class="page-scroll" href="#overview" style="color:#CD853F;font-size:20px">Overview</a></li>
          <li><a class="page-scroll" href="#organizer" style="color:#CD853F;font-size:20px">Organizer</a></li>
          <li><a class="page-scroll" href="#schedule" style="color:#CD853F;font-size:20px">Program</a></li>
          <li><a class="page-scroll" href="#speaker" style="color:#CD853F;font-size:20px">Speaker</a></li>
        </ul>
      </div>
    </nav>
  </div>
<div class="container">
  <table border="0" align="center">
    <tr>
      <td width="700" align="center" valign="middle"><h3>CVPR 2024 Tutorial on</h3>
      <span class="title">3D/4D Generation and Modeling with Generative Priors</span></td>
    </tr>
  </table>
        <h3 align="center"> <b>Date:</b> Tuesday, June 18th 8:30 a.m. PDT - noon PDT. </h3>
        <h3 align="center"> <b>Location:</b> Summit 440-441</h3>

        <!--h3 colspan="3" align="center"><br> Slides and recorded videos will be provided on this webpage.</h3-->
        <!-->
        <!--h3 colspan="3" align="center"><br>The tutorial can be accessed at: <a a href=https://ohyay.co/s/cvpr-tutorial-on-unlocking-creativity> this URL </a>.
        <br>
        Anyone can join! </h3-->
<!--   <p><img src="figures/teaser.jpg" width="1000" align="middle" /></p> -->
</div>

</br>

<div class="container" id="recorded-video">
  <h2>Recorded Video</h2>
  <div>
    <div class="text-center">
      <div style="position:relative;padding-top:56.25%;">
          <iframe style="position:absolute;top:0;left:0;width:100%;height:100%;"
            src="https://www.youtube.com/embed/QA5vxU5KxUc?si=B3WwuzrulGemSVD4" title="YouTube video player" frameborder="0"
            allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
            allowfullscreen></iframe>
        </div>

  </div>

  <p></p>
</div>

</br>

<div class="container" id="overview">
  <h2>Overview</h2>
    <div class="overview">
    </br>
    <div class="media-container">
       <img src="media/GTR.gif" class="media-item" alt="GTR">
       <img src="media/scenetex.gif" class="media-item" alt="scenetex">
       <img src="media/4real.gif" class="media-item" alt="4real">
    </div>
    <p>In the ever-expanding metaverse, where the physical and digital worlds seamlessly merge, the need to capture, represent, and analyze three-dimensional structures is crucial. The advancements in 3D and 4D generation technologies have transformed gaming, augmented reality (AR), and virtual reality (VR), offering unprecedented immersion and interaction. Bridging the gap between reality and virtuality, 3D modeling enables realistic simulations, immersive gaming experiences, and AR overlays. Adding the temporal dimension enhances these experiences, enabling lifelike animations, object tracking, and understanding complex spatiotemporal relationships, reshaping digital interactions in entertainment, education, and beyond.

    <p>Traditionally, 3D generation involved directly manipulating 3D data and attempt to recover 3D details using 2D data.
      Recent breakthroughs in 2D diffusion models have significantly improved 3D generation.
      Methods using 2D priors from diffusion models have emerged, enhancing the quality and diversity of 3D asset generation.
      These methods range from inpainting-based approaches and optimization-based techniques like Score Distillation Sampling (SDS), to recent feed-forward generation using multi-view images as an auxiliary medium.

    <p>On the other hand, challenges persist in extending 3D asset generation to scenes and mitigating biases in 2D priors for realistic synthesis in real-world settings. Addressing these issues, our tutorial delves into 3D scene generation, exploring techniques for diverse scene scales, compositionality, and realism.
       Finally, we also cover recent advancements in 4D generation using images and videos models as priors, crucial for applications like augmented reality.
       Attendees will gain insights into various paradigms of 3D/4D generation, from training on 3D data to leveraging 2D diffusion model knowledge, resulting in a comprehensive understanding of contemporary 3D modeling approaches.

    <p>In conclusion, our tutorial provides a comprehensive exploration of 3D/4D generation and modeling, covering fundamental techniques to cutting-edge advancements. By navigating scene-level generation intricacies and leveraging 2D priors for enhanced realism, attendees will emerge equipped with a nuanced understanding of the evolving landscape of 3D modeling in the metaverse era.

    </div>
</div>

</br>

<div class="container" id="organizer">
  <h2>Organizers</h2>
  <div>

    <div class="instructor">
      <a href="http://hsinyinglee.com/">
        <div class="instructorphoto"><img src="figures/hsin.png"></div>
        <div>Hsin-Ying Lee<br>Creative Vision, Snap Research</div>
      </a>
    </div>

    <div class="instructor">
      <a href="https://payeah.net/">
        <div class="instructorphoto"><img src="figures/peiye.png"></div>
        <div>Peiye Zhuang<br>Creative Vision, Snap Research</div>
      </a>
    </div>

    <div class="instructor">
      <a href="https://mightychaos.github.io/">
        <div class="instructorphoto"><img src="figures/chaoyang.png"></div>
        <div> Chaoyang Wang <br>Creative Vision, Snap Research</div>
      </a>
    </div>


  </div>

  <p></p>
</div>


</br>


<div class="container" id="schedule">
  <h2>Program</h2>
  <table class="program">

    <tr>
      <td width="70%">
        <p style="font-size:20px"> <b>Introduction</b> </a> </p>
      </td>
      <td width="20%"><em>Hsin-Ying Lee</em></td>
      <td width="10%"><b>08:30 - <br /> 08:40</b></td>
      <td width="10%">
        <a href="https://drive.google.com/file/d/1ba86ESYabbklphfIkkF5Izh8iz2Cwxrm/view?usp=sharing">PDF</a>
      </td>
    </tr>


    <tr>
      <td width="70%">
        <p style="font-size:20px"> <b>3D Generation w/o Large-Scale 2D Priors</b> </a> </p>
         Introducing conventional ways of
         training 3D generation models using 2D and 3D data without large-scale image and video diffusion models.
      </td>
      <td width="20%"><em>Hsin-Ying Lee</em></td>
      <td width="10%"><b>08:40 - <br /> 09:00</b></td>
      <td width="10%">
        <a href="https://drive.google.com/file/d/1ItJvdwAX6gryPAohSNc6cguUrrBbuo-o/view?usp=sharing">PDF</a>
      </td>
    </tr>


    <tr>
      <td>
        <p style="font-size:20px"> <b>Bridging 2D and 3D: From Optimization to Feedforward </b> </p>
        Introducing two ways of performing 3D generation with the help of large-scale 2D diffusion models,
        including optimization-based methods distilling knowledge with Score Distillation Sampling (SDS) and its variants, and feedforward methods with the help of multi-view image generation.

      </td>
      <td><em>Peiye Zhuang</em></td>
      <td><b>09:10 - <br /> 10:00</b></td>
      <td width="10%">
        <a href="https://drive.google.com/file/d/10Y9Swd1muMocNgNX64ApH-ntFblj0XCc/view?usp=sharing">PDF</a>
      </td>
    </tr>

    <tr>
      <td>
        <p style="font-size:20px"> <b>3D Scene Generation</b> </p>
        Introducing the recent advances and challenges in 3D scene generation.

      </td>
      <td><em>Hsin-Ying Lee</em></td>
      <td><b>10:10 - <br /> 10:40</b></td>
      <td width="10%">
        <a href="https://drive.google.com/file/d/1ZeR5yvU5s9HnYr_cDfx-CMiLZ4D3JLVO/view?usp=sharing">PDF</a>
      </td>
    </tr>

    <tr>
      <td>
        <p style="font-size:20px"> <b>4D Generation and Reconstruction </b>
        </p> Introducing recent advancements on 4D generation as well as generation vis reconstruction.


      </td>
      <td><em>Chaoyang Wang</em></td>
      <td><b>10:50 - <br /> 11:35</b></td>
      <td width="10%">
        <a href="https://drive.google.com/file/d/1vdr4fGamoQq-l6wByUV14cl5a3s-sNmP/view?usp=sharing">PDF</a>
      </td>
    </tr>

    <td>
      <p style="font-size:20px"> <b>Closing Remarks</b></p>
    </td>
    <td><em>Hsin-Ying Lee</em></td>
    <td><b>11:35 - <br /> 11:45</b></td>


    </tr>


  </table>
</div>

</br>

<div class="container" id='speaker'>
  <h2>About the Speakers</h2>
  <div class="schedule">
    <p><b>Hsin-Ying Lee</b> is a Senior Research Scientist in the Creative Vision team at Snap Research.
      His research focuses on content generation, specifically, image/video/3D/4D generation and manipulation.
      He has published 50+ top conference papers and journals.
      Hsin-Ying got Ph.D. in the University of California, Merced.
      Before joining Snap Inc, Hsin-Ying did internships in Google and Nvidia. </p>

    <p><b>Peiye Zhuang </b> is a Research Scientist in the Creative Vision group at Snap Research.
      Her research focuses on foundation generative models and various content creation applications,
      including 2D/3D/video generation and editing. Before joining Snap, Peiye received her PhD degree in Computer Science
      at University of Illinois at Urbana-Champaign (UIUC) in 2023. She also spent time at Stanford University and interned
       with Apple, Google Brain, Facebook (now Meta), and Adobe.
      </p>

    <p><b>Chaoyang Wang </b> is a Research Scientist in the Creative Vision group at Snap Research.
      His research focuses on 3D/4D reconstruction and its application for photo-realistic novel view synthesis and
      content generation. He got his Ph.D. degree in the Robotics Institute of Carnegie Mellon University.
       Before joining Snap Inc, Chaoyang did internships in Nvidia, Adobe, Microsoft and Argo AI.
    </p>

  </div>
</div>

</br>

<div class="containersmall">
    <p>Please contact <a href="hlee5@snap.com">Hsin-Ying Lee</a> if you have question. The webpage template is by the courtesy of awesome <a href="https://gkioxari.github.io/">Georgia</a>.</p>
</div>

<!--<p align="center" class="acknowledgement">Last updated: Jan. 6, 2017</p>-->
</body>
</html>