We created an evaluation benchmark using the CustomHumans dataset. Please apply the dataset directly and you will find the necessary files in the download link. The openpose keypoint files can be downloaded here 60_images and 240_images.
Note that we trained our models with 526 human scans provided in the THuman2.0 dataset and tested on 60 scans in the CustomHumans dataset. We used the default hyperparameters and commands suggested in run.sh
. The evaluation script can be found here and here. You will need to install two additional packages for evaluation:
pip install torchmetrics[image] mediapipe
Methods | P-to-S (cm) ↓ | S-to-P (cm) ↓ | NC ↑ | f-Score ↑ |
---|---|---|---|---|
PIFu [Saito2019] | 2.209 | 2.582 | 0.805 | 34.881 |
PIFuHD[Saito2020] | 2.107 | 2.228 | 0.804 | 39.076 |
PaMIR [Zheng2021] | 2.181 | 2.507 | 0.813 | 35.847 |
FOF [Feng2022] | 2.079 | 2.644 | 0.808 | 36.013 |
2K2K [Han2023] | 2.488 | 3.292 | 0.796 | 30.186 |
ICON* [Xiu2022] | 2.256 | 2.795 | 0.791 | 30.437 |
ECON* [Xiu2023] | 2.483 | 2.680 | 0.797 | 30.894 |
SiTH* (Ours) | 1.871 | 2.045 | 0.826 | 37.029 |
- *indicates methods trained on the same THuman2.0 dataset.
Methods | SSIM ↑ | LPIPS↓ | KID(×10^−3^) ↓ | Joints Err. (pixel) ↓ |
---|---|---|---|---|
Pix2PixHD [Wang2018] | 0.816 | 0.141 | 86.2 | 53.1 |
DreamPose [Karras2023] | 0.844 | 0.132 | 86.7 | 76.7 |
Zero-1-to-3 [Liu2023] | 0.862 | 0.119 | 30.0 | 73.4 |
ControlNet [Zhang2023] | 0.851 | 0.202 | 39.0 | 35.7 |
SiTH (Ours) | 0.950 | 0.063 | 3.2 | 21.5 |
The input to your methods is images_60
and the ground truth is gt_meshes_60
.
Please note that you cannot use the ground-truth SMPL-X meshes as inputs to your method.
Your method should predict the SMPL-X meshes from the input images only.
To avoid scale and depth ambiguity, we use ICP to align the predicted meshes to the ground-truth meshes.
You can use the evaluation script by running the following command:
python tools/evaluate.py -i /path/to/your/results -g /path/to/ground-truth
The input to your methods is images_60
and the ground truth is gt_back_60
.
Note that to consider the stochastic nature of generative models, we generate 16 samples per input image.
python tools/evaluate_image.py -i /path/to/your/results -g /path/to/ground-truth --mask
- Please note that SiTH uses an orthogonal coordinate system as other methods in the baseline table. If you want to compute 3D metrics, you need to render images with an orthographic camera.
- The image resolution for 3D reconstruction and SMPL-X fitting is 1024x1024. Only the resolution for the diffusion hallucination is 512x512. By default, you should prepare an image with a 1024x1024 resolution, and the config file will handle everything for you.
- SiTH is a SMPL-centric pipeline and does not handle special cases like holding an object or children.