Skip to content

Latest commit

 

History

History
336 lines (274 loc) · 16.4 KB

README.md

File metadata and controls

336 lines (274 loc) · 16.4 KB

Stable, multi-view Point·E

In this repo we introduce multi-view conditioning for point-cloud diffusion, we test it in two pipelines: multiple synthetic views from text; multiple views from photos in the wild. We develop an evaluation dataset based on ShapeNet and ModelNet and propose a new metric to assess visually and analitically the overlap between two point clouds. This repo is based on the official implementation of Point-E.

Point-E is a diffusion model: a generative model that approximates a data distribution through noising (forward process) and denoising (backward process). The backward process is also named "sampling", as you start from a noisy point in the distribution and convert it back to signal with some conditional information. In Point-E, we start from a random point cloud of 1024 points and denoise it with images (an object photo) as conditioning signal.

Compared to other techniques in literature, such as Neural Radiance Fields, you can sample a point cloud with Point-E with a single gpu in 1-2 minutes. Sample quality is the price to pay, making this technique ideal for task where point clouds are best suited.

Table of contents

  1. Contributions
  2. Setup
  3. Experiments
  4. Evaluation
  5. Credits

Contributions

We extend conditioning for point cloud diffusion with multiple views. This tackles the problem of generating objects with duplicated faces, blurring in occluded parts and 3d consistency.

Multi-view with patch concatenation

Each conditioning image is encoded with the pre-trained OpenAI CLIP, all the resulting embeddings are concatenated and fed as tokens into the denoising transformer.
See: mv_point_e/models/transformer.py

Pipeline for point-e on top of stable diffusion 2
Original image: Nichol et al. 2022

Multi-view with stochastic conditioning

With inspiration from Watson et al. 2022, a random conditioning image (from a given multi-view set) is fed to the denoising transformer at each diffusion denoising step.
See: sc_point_e/models/transformer.py

Pipeline for point-e on top of stable diffusion 2
Original image: Watson et al. 2022

Multiple synthetic views with 3D-Diffusion

We use 3D-Diffusion from Watson et al. 2022 to generate 3d-consistent multiple views from a single, text-generated image (with stable diffusion 2). The current model is pre-trained on SRNCars, a ShapeNet version will be released soon (contribute here).

Pipeline for point-e on top of stable diffusion 2

Setup

There are two variants for multi-view:

  • Patch concatenation: mv_point_e
  • Stochastic conditioning: sc_point_e

You can either:

  1. Rename the folder of version you choose to point_e and run pip install -e .
  2. Without installing a global package, import from the specific variant in your code, e.g. for sc_point_e:
from sc_point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from sc_point_e.diffusion.sampler import PointCloudSampler
from sc_point_e.models.download import load_checkpoint
from sc_point_e.models.configs import MODEL_CONFIGS, model_from_config

from sc_point_e.evals.feature_extractor import PointNetClassifier, get_torch_devices
from sc_point_e.evals.fid_is import compute_statistics
from sc_point_e.evals.fid_is import compute_inception_score
from sc_point_e.util.plotting import plot_point_cloud

Experiments

Preprocessing

  • [1] Generating the textureless objects dataset (views, ground shapes).
  • [2] Generating the complete textured objects dataset (views, ground shapes).

3D Reconstruction

  • [1] Text-to-3d with Stable Diffusion 2 + Inpainting (single view)
  • [2] Text-to-3d with multiple rendered views from the SRNCars Dataset (multi-view)
  • [3] Text-to-3d with multiple synthetic views from Stable Diffusion + 3D-Diffusion (Watson et al. 2022)
  • [4] Text-to-3d from multiple photos "in the wild"

Evaluation, metrics

  • [1] Dataset pre-processing and scores computation
  • [2] A digression on the chosen metrics with experiments
  • [3] Evaluating text-to-3D from multi view (patch concat.)
  • [4] Comparing the chosen multi-view, text-to-3D methodologies
  • [5, 6] Evaluating results on occluded object parts
  • [7] Scores visualization and plotting

Evaluation

This dataset has been developed to assess the quality of the reconstructions from our multi-view models wrt. single-view Point-E. Through experimentation, we generated several datasets from the available sources ModelNet40, ShapeNetV2, ShapeNetV0. Specifically, the datasets generated from ModelNet40, ShapeNetV0 are textureless: we generated synthetic colouring through since RGB/grayscale values and sine functions.

Getting Started & Installing

The complete set of data can be found at this link.

Name Samples Source
ModelNet40, textureless 40 Google Drive
ShapeNetv2, textureless 55 Google Drive
Mixed, textureless 190 Google Drive
Shapenet with textures 650 Google Drive
OpenAI seed imgs/clouds / Google Drive
OpenAI, COCO CLIP R-Precision evals / Google Drive

Here you can find the generated clouds from the dataset ShapeNetv2 and ModelNet40 textureless comprehensive of the ground truth data, score and plot of the pairwise divergence distribution. More details are provided in the description.

Description

Each sample in the dataset consists in a set of RGB, 256x256 V views and a cloud of K points sampled with PyTorch3D.

    view:   (N, V, 256, 256, 3)
    cloud:  (N, K, 3)

Further details on rendering:

  • The light of the scene is fixed
  • No reflections
  • Two versions of the dataset:
    • Fixed elevation and distance of the camera from the object, we took 6 pictures rotating around the object
    • Fixed the distance of the camera from the object, we took 6 pictures changing stochastically the value of the elevation of the camera and rotating around the object
  • We iterate this procedure on 25 different objects for each class in ShapeNet
  • Each view is 256x256

You can see the pipeline for the generation of the ShapeNet dataset with textures here.

Concerning the set of views in the dataset produced from ShapeNetv2 and ModelNet40 textureless:

  • The light of the scene is fixed
  • No reflections
  • We fixed the elevation and the distance of the camera from the object and we took 4 pictures rotating around the object
  • We iterate this procedure on one object for each class in ShapeNetv2 and ModelNet40
  • Each view is 512x512

You can check the pipeline for the generation of the ShapeNetv2 and ModelNet40 textureless dataset here with all the steps.

Here follows the directories structure:

<directories>
    > shapenet_withTextures
        >> eval_clouds.pickle
        >> eval_views_fixed_elevation.pickle
        >> eval_views_stochastic_elevation.pickle       
    > modelnet40_texrand_texsin
        >> modelnet_csinrandn
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
        >> modelnet_texsin
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
    > shapenetv2_texrand_texsin
        >> shapenetv2_csinrandn
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
        >> shapenetv2_texsin
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
    > shapenetv2_modelnet40_texrand_texsin
        >> shapenet_modelnet_singleobject
            >>> modelnet_csinrandn
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
            >>> modelnet_texsin
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
            >>> shapenet_csinrandn
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
            >>> shapenet_texsin
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
    > dataset_shapenet_modelnet_texsin_withgeneratedcloud
        >> modelnet_texsin
            >>> CLASS_MAP.pt
            >>> eval_clouds_modelnet_300M.pickle
            >>> images_obj.pt
            >>> labels.pt
            >>> modelnet_gencloud_300M
            >>> points.pt
        >> shapenet_texsin
            >>> CLASS_MAP.pt
            >>> eval_clouds_shapenet_300M.pickle
            >>> images_obj.pt
            >>> labels.pt
            >>> shapenet_gencloud_300M
            >>> points.pt


File specifications

shapenet_withTextures

- list of the sampled cloud: eval_clouds.pickle # (n_img, ch, n_points) ch: 6, n_points: 4096

- list of gen views with fixed elevation: eval_views_fixed_elevation.pickle # (n_img, n_view, 256, 256, 3)

- list of gen views with stochastic elevation: eval_views_stochastic_elevation.pickle # (n_img, n_view, 256, 256, 3)

shapenetv2_modelnet40_texrand_texsin

- dictionary with {index: 'typeOfObject'}: CLASS_MAP.pt 

- multiple viwes for each object: images_obj.pt # (n_img, n_view, 512, 512, 3)

- label for each object: labels.pt # (n_img,)

- ground truth point cloud: points.pt # (n_img,)

- tensor with the the generated pointcloud with point-e 300M: 
  ch: 6 (first 3 channel coord the others are the rgb colors of each point)
  n_points: 4096 (generated points)
                                                  modelnet_gencloud_300M # (n_img, ch, n_points)
                                                  shapenet_gencloud_300M # (n_img, ch, n_points)

dataset_shapenet_modelnet_texsin_withgeneratedcloud

- dictionaries: 
            eval_clouds_modelnet_300M.pickle
            eval_clouds_shapenet_300M.pickle

    dictionary['nameOfTheObject'][index]

                                  index 0: divergence_ground_single
                                  index 1: divergence_ground_single_distribution_plot
                                  index 2: divergence_ground_multi
                                  index 3: divergence_ground_multi_distribution_plot
                                  index 4: divergence_single_multi
                                  index 5: divergence_single_multi_distribution_plot
                                  index 6: ground_truth_pis
                                  index 7: single_view_pis
                                  index 8: multi_view_pis
                                  index 9: ground_truth_point_cloud
                                  index 10: single_view_point_cloud
                                  index 11: multi_view_point_cloud

Dependencies

  • import the files pt with torch
images_obj_views = torch.load(os.path.join(base_path,'images_obj.pt'))
  • import the pickle file with the metrics or the shapenet_withTextures files
  • more info in the notebook1 or notebook2.
dataset = 'shapenet'
base_path = os.path.join(dataset+"_texsin")
with open(os.path.join(base_path, 'eval_clouds_'+dataset+'_300M.pickle'), 'rb') as handle:
    data = pickle.load(handle)

Possible improvements

  • Extending the dataset ShapeNet PSR
  • Increasing the view resolution 512x512 or 1024x1024

Authors

Version History

  • 0.1
    • Initial Release

License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgments

Credits

Cite this work

@misc{CalanzoneTedoldi2022,
    title   = {Generating point clouds from multiple views with Point-E},
    author  = {Diego Calanzone, Riccardo Tedoldi, Zeno Sambugaro},
    year    = {2023},
    url  = {http://github.com/halixness/point-e}
}