Stable-Dreamfusion

A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.

NEWS (2023.5.8):

Support of DeepFloyd-IF as the guidance model.
Enhance Image-to-3D quality, support Image + Text condition of Make-it-3D.

image-to-3d-0123.mp4

text-to-3d.mp4

Update Logs

Colab notebooks:

Instant-NGP backbone (-O):
Vanilla NeRF backbone (-O2):

Important Notice

This project is a work-in-progress, and contains lots of differences from the paper. The current generation quality cannot match the results from the original paper, and many prompts still fail badly!

Notable differences from the paper

Since the Imagen model is not publicly available, we use Stable Diffusion to replace it (implementation from diffusers). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training.
We use the multi-resolution grid encoder to implement the NeRF backbone (implementation from torch-ngp), which enables much faster rendering (~10FPS at 800x800).
We use the Adan optimizer as default.

Install

git clone https://github.com/ashawkey/stable-dreamfusion.git
cd stable-dreamfusion

Optional: create a python virtual environment

To avoid python package conflicts, we recommend using a virtual environment, e.g.: using conda or venv:

python -m venv venv_stable-dreamfusion
source venv_stable-dreamfusion/bin/activate # you need to repeat this step for every new terminal

Install with pip

pip install -r requirements.txt

Download pre-trained models

To use image-conditioned 3D generation, you need to download some pretrained checkpoints manually:

Zero-1-to-3 for diffusion backend. We use 105000.ckpt by default, and it is hard-coded in guidance/zero123_utils.py.
```
cd pretrained/zero123
wget https://huggingface.co/cvlab/zero123-weights/resolve/main/105000.ckpt
```

Omnidata for depth and normal prediction. These ckpts are hardcoded in preprocess_image.py.

mkdir pretrained/omnidata
cd pretrained/omnidata
# assume gdown is installed
gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt
gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt

To use DeepFloyd-IF, you need to accept the usage conditions from hugging face, and login with huggingface-cli login in command line.

For DMTet, we port the pre-generated 32/64/128 resolution tetrahedron grids under tets. The 256 resolution one can be found here.

Build extension (optional)

By default, we use load to build the extension at runtime. We also provide the setup.py to build each extension:

cd stable-dreamfusion

# install all extension modules
bash scripts/install_ext.sh

# if you want to install manually, here is an example:
pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)

Taichi backend (optional)

Use Taichi backend for Instant-NGP. It achieves comparable performance to CUDA implementation while No CUDA build is required. Install Taichi with pip:

pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly

Trouble Shooting:

we assume working with the latest version of all dependencies, if you meet any problems from a specific dependency, please try to upgrade it first (e.g., pip install -U diffusers). If the problem still holds, reporting a bug issue will be appreciated!
[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped): this usually indicates problems in OpenGL installation. Try to re-install Nvidia driver, or use nvidia-docker as suggested in ashawkey#131 if you are using a headless server.
TypeError: xxx_forward(): incompatible function arguments： this happens when we update the CUDA source and you used setup.py to install the extensions earlier. Try to re-install the corresponding extension (e.g., pip install ./gridencoder).

Tested environments

Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.

Usage

First time running will take some time to compile the CUDA extensions.