This is the official PyTorch implementation of the publication:
A. Simoni, S. Pini, R. Vezzani, R. Cucchiara
Multi-Category Mesh Reconstruction From Image Collections
In International Conference on 3D Vision (3DV) 2021
[Paper]
Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Differently from previous works, our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Without specific 3D templates, the framework learns category-level models which are deformed to recover the 3D shape of the depicted object. The instance-specific deformations are predicted independently for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the mesh during the training process. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner. Predicted shapes are smooth and can leverage from multiple steps of subdivision during the training process, obtaining comparable or state-of-the-art results on two public datasets. Models and code are publicly released.
Create your Python environment and install the required packages with the following commands:
conda create -n <env_name> python=3.6
conda activate <env_name>
pip install -r requirements.txt
Install PyTorch 1.5.1 and TorchVision 0.6.1 with CudaToolkit 10.2.
Install SoftRas Renderer extension:
export CUDA_HOME=/path/to/cuda
cd mcmr/
python setup_softras.py build develop # install to workspace
The renderer was compiled using CUDA 10.2.
Install PyTorch3D v0.4.0 (https://pytorch3d.org/ -- GitHub repository)
conda install -c bottler nvidiacub
# OR
curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
tar xzf 1.10.0.tar.gz
export CUB_HOME=$PWD/cub-1.10.0
pip install fvcore iopath
pip install "git+https://github.com/facebookresearch/[email protected]"
Please follow the instructions from mcmr Data Preparation.
Run the following script:
python main.py --dataset_name pascal \
--dataset_dir your-folder/datasets/PASCAL_final \
--classes <class-names> \
--sub_classes \ # train on PASCAL3D+ 3D models sub-classes
--single_mean_shape \ # use single meanshape training
--cmr_mode \ # use GT+MaskRCNN instead of GT+PointRend masks
--subdivide 4 \ # starting mesh subdivision level
--sdf_subdivide_steps 351 \ # mesh subdivision epoch steps
--use_learned_class \ # activate unsupervised shape selection module
--num_learned_meanshapes <num-of-meanshapes> \ # set number of meanshapes
--checkpoint_dir <checkpoint-directory> \
--log_dir <log-directory> \
--pretrained_weights <weights-file> \ # resume from pre-trained weights
--cam_loss_wt 20.0 \
--cam_reg_wt 0.1 \
--mask_loss_wt 100.0 \
--deform_reg_wt 0.05 \
--laplacian_wt 6.0 \
--laplacian_delta_wt 1.8 \
--graph_laplacian_wt 0. \
--tex_percept_loss_wt 0.8 \
--tex_color_loss_wt 0.03 \
--tex_pixel_loss_wt 0.005 \
--is_training \ # activate training mode
(--faster) # disable deterministic mode
Run the following script:
python main.py --dataset_name cub \
--dataset_dir your-folder/datasets/UCMR_CUB_data/cub/ \
--classes all \
--single_mean_shape \ # use single meanshape training
--subdivide 4 \ # starting mesh subdivision level
--sdf_subdivide_steps 351 \ # mesh subdivision epoch steps
--use_learned_class \ # activate unsupervised shape selection module
--num_learned_meanshapes <num-of-meanshapes> \ # set number of meanshapes
--checkpoint_dir <checkpoint-directory> \
--log_dir <log-directory> \
--pretrained_weights <weights-file> \ # resume from pre-trained weights
--cam_loss_wt 2.0 \
--cam_reg_wt 0.1 \
--mask_loss_wt 20.0 \
--deform_reg_wt 0.005 \
--laplacian_wt 1.2 \
--laplacian_delta_wt 0.18 \
--graph_laplacian_wt 0. \
--tex_percept_loss_wt 3.2 \
--tex_color_loss_wt 0.12 \
--tex_pixel_loss_wt 0.02 \
--is_training \ # activate training mode
(--faster) # disable deterministic mode
Original paper experiments can be reproduced as follows:
- Download pre-trained weights from mcmr Model Zoo.
- Run the testing script as follows:
python main.py --dataset_name pascal \
--dataset_dir your-folder/datasets/PASCAL_final \
--classes <class-names> \
--sub_classes \ # activate intra-class variation
--cmr_mode \ # use of GT+MaskRCNN masks instead of GT+PointRend
--subdivide 4 \ # starting mesh subdivision level
--sdf_subdivide_steps 351 \ # mesh subdivision epoch steps
--use_learned_class \
--num_learned_meanshapes <num-of-meanshapes> \ # set number of meanshapes
--checkpoint_dir <checkpoint-directory> \
--log_dir <log-directory> \
--pretrained_weights <weights-file> \ # load pre-trained weights for testing
--cam_loss_wt 20.0 \
--cam_reg_wt 0.1 \
--mask_loss_wt 100.0 \
--deform_reg_wt 0.05 \
--laplacian_wt 6.0 \
--laplacian_delta_wt 1.8 \
--graph_laplacian_wt 0. \
--tex_percept_loss_wt 0.8 \
--tex_color_loss_wt 0.03 \
--tex_pixel_loss_wt 0.005 \
--save_dir <save-directory> \ # directory to save qualitative results
--save_results \ # activate qualitative results saving
--qualitative_results \ # activate qualitative results with weighted meanshape saving
(--faster) # disable deterministic mode
python main.py --dataset_name cub \
--dataset_dir your-folder/datasets/UCMR_CUB_data/cub/ \
--classes all \
--single_mean_shape \ # use single meanshape training
--subdivide 4 \ # starting mesh subdivision level
--sdf_subdivide_steps 351 \ # mesh subdivision epoch steps
--use_learned_class \ # activate unsupervised shape selection module
--num_learned_meanshapes <num-of-meanshapes> \ # set number of meanshapes
--checkpoint_dir <checkpoint-directory> \
--log_dir <log-directory> \
--pretrained_weights <weights-file> \ # load pre-trained weights for testing
--cam_loss_wt 2.0 \
--cam_reg_wt 0.1 \
--mask_loss_wt 20.0 \
--deform_reg_wt 0.005 \
--laplacian_wt 1.2 \
--laplacian_delta_wt 0.18 \
--graph_laplacian_wt 0. \
--tex_percept_loss_wt 3.2 \
--tex_color_loss_wt 0.12 \
--tex_pixel_loss_wt 0.02 \
--save_dir <save-directory> \ # directory to save qualitative results
--save_results \ # activate qualitative results saving
--qualitative_results \ # activate qualitative results with weighted meanshape saving
(--faster) # disable deterministic mode
- Add model implementation
- Add dataloader implementation
- Add train/test scripts
- Add preprocessed dataset request instructions
- Add dataset preprocessing instructions and scripts
- Add train/test instructions
- Add pretrained models
- Alessandro Simoni - alexj94
- Stefano Pini - stefanopini
- Roberto Vezzani - robervez
- Rita Cucchiara - Rita Cucchiara
If you find this repository useful for your research, please cite the following paper:
@inproceedings{simoni2021multi,
title={Multi-Category Mesh Reconstruction From Image Collections},
author={Simoni, Alessandro and Pini, Stefano and Vezzani, Roberto and Cucchiara, Rita},
booktitle={2021 International Conference on 3D Vision (3DV)},
pages={1321--1330},
year={2021},
organization={IEEE}
}
This project is licensed under the MIT License - see the LICENSE file for detail