InsertDiffusion

This repository contains the official implementation for the paper "InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture".

Usage

The implementation supports two different running modes:

image mode - allows to "render" a preexisting image (arbitrary, as long as the background is white alternatively the object can be extracted from the original background by providing an object description) into an arbitrary background
point mode - (not central to our research) allows to create bike outlines from a point cloud and then insert the "rendered" bikes into an arbitrary background

In addition, the implementation offers the option to colorize the preexisting or generated images. When not using already colored images this drastically improves realism.

Further, a new environment can either be created from scratch (background replacement mode) or a preexisting image can be used and adapted.

All features are available from the main.py CLI and can also be integrated into your own projects. To use the CLI or the code an environment with torch, Pillow, pandas, transformers, and diffusers is required.

Image mode

To insert an image into a newly created background, execute:

python main.py --image "<path to your image>" --background_prompt "<your prompt>"

So for example:

python main.py --image "./test_images/bikes/166.png" --background_prompt "a bicycle centered in frame in munich, 8k, red gemini, 50mm, f2.8, cinematic"

Instead of providing a prompt directly you can use the auto-prompting features by providing the necessary information. For bikes you need to provide the desired location, the index of the bike and a path to the datasheet. For cars you will have to provide the desired new location, the car manufacturer and car type. For products only the new location and a product type is required.

For further options and insertion into a given background consult the full CLI documentation below.

Instead of using the CLI, you may integrate the procedure into your code with this python function (again make sure to have all dependencies installed and copy the utils module to your project):

from utils import get_mask_from_image, sd_inpainting, sd_img2img, paste_image

def insert_diffusion(image: Image, mask_threshold: int, prompt: str, negative_prompt: str, img2img_model: str, inpainting_model: str, img2img_strength: float, inpainting_steps: int, inpainting_guidance: float, img2img_guidance: float, background_image: Image=None, composition_strength: float=1) -> Image:
    mask = get_mask_from_image(image, mask_threshold)
    
    if background_image is not None:
        image = paste_image(image, background_image)

    inpainted = sd_inpainting(inpainting_model, image, mask, prompt, negative_prompt, inpainting_guidance, inpainting_steps, inpainting_strength=composition_strength)
    result = sd_img2img(img2img_model, inpainted, prompt, negative_prompt, img2img_strength, img2img_guidance)
    return result

Point mode

We implemented an additional method to generate Biked images from point clouds and then insert them into a background. This method does not work well, however, and is, therefore, not central to our research. The generation from point clouds was implementation by Ioan-Daniel Craciun and is based upon a DDPM/DDIM implemented and trained by Jiajae Fan.

To create images from point clouds remove the --image argument from your CLI call. In point mode you have to provide a datasheet path via --datasheet_path. Thus, the minimal CLI command becomes:

python main.py --point --datasheet_path "<path to datasheet>" --background_prompt "<your prompt>"

You may also use auto-prompting in point mode.

To integrate the generation of bikes from point clouds into your project copy the utils folder and use the following function:

from utils import get_mask_and_background
from utils import inpaint as bike_inpainting

def bike_diffusion(parameter_csv_path: str, device: torch.device, ckpt_id: str='29000', mask_dilation: int=5, mask_fill_holes: bool=True, bike_idx: int=0, wheel_design_type: int=0, width: int=256, height: int=256):
    assert wheel_design_type == 0 or wheel_design_type == 1
    mask, background = get_mask_and_background(parameter_csv_path, bike_idx, wheel_design_type, width, height)
    bike_img = bike_inpainting(background, mask, device, 50, ckpt_id, mask_dilation, mask_fill_holes)
    return bike_img.convert('RGB')

The image returned by this function is ready to be used for colorization or insertion.

Colorization

If you are using an uncolored image add --colorize to one of the previous CLI calls. In addition, you need to provide a colorization prompt via --colorization_prompt or use auto-prompting by providing a color via --color.

An example CLI call could be:

python main.py --image "./test_images/bike_outline/168.png" --colorize  --datasheet_path "./csv/df_parameters_final.csv" --place "beach at sunset" --color "purple" --bike_idx 5"

Alternatively, to integrate colorization into your project, copy the utils module and use:

from utils import sd_colorization

def colorization(image: Image, colorization_model: str, upscaling_model: str, colorization_prompt: str, colorization_negative_prompt: str, fill_holes: bool, dilation: int, strength: float, prompt_guidance: float):
    colorized = sd_colorization(colorization_model, upscaling_model, image, colorization_prompt, negative_prompt=colorization_negative_prompt, fill_holes=fill_holes, dilation_iterations=dilation, colorization_strength=strength, prompt_guidance=prompt_guidance)
    return colorized

CLI arguments

Note: CLI is still under construction and might be subject to change.

Note: Where applicable default values represent the parameters determined in our experiments. But different values might be optimal depending on the specific use case.

argument	type	description
--image	string	path to a image to start from, mutually exclusive with --points
--points		when points is used the algorithm runs in point cloud mode and generates a bike outline from a point cloud first, mutually exclusive with --image
--mask_threshold	int	for inpainting, threshold to discern white background from colored foreground
--background_prompt	string	the prompt for the background generation
--negative_prompt	string	negative prompt for background generation
--background_image	string	string to a background image to be used as a starting point, only relevant if --composition strength is set to a value smaller than 1
--composition_strength	float	determines how much to change the starting point background image, only used is --background_image is set, range 0-1
--auto_bike_prompt		if set prompts will be automatically created using the template for bikes, requires --place, --datasheet_path, and --bike_idx to be set
--auto_car_prompt		if set prompts will be automatically created using the template for cars, requires --place, --car_manufacturer, and --car_type to be set
--auto_product		if set prompts will be automatically created using the template for products, requires --place, and --product_type to be set
--place	string	description of the location where the object is to be inserted, only used if one of the auto prompt templates is used
--color	string	if using autoprompting, which color does the bike have
--datasheet_path	string	if using autoprompting for bikes, path to datasheet for lookup of bike type
--bike_idx	int	if using autoprompting for bikes, index in datasheet for lookup of bike type
--car_manufacturer	string	if using autoprompting for cars, manufacturer of the car e.g. BMW
--car_type	string	if using autoprompting for cars, type of the car e.g. SUV or X5
--product_type	string	if using autoprompting for products, type of the product e.g. lamp
--inpainting_model	string	which model to use for background generation (huggingface id)
--img2img_model	string	which model to use for rediffusion step (huggingface id)
--img2img_strength	float	how much of the original image to noise in rediffusion
--inpainting_steps	int	how many diffusion steps to perform for inpainting
--inpainting_guidance	float	how much classifier-free guidance to apply in inpainting
--img2img_guidance	float	how much classifier-free guidance to apply in rediffusion
--output_folder	string	path to folder in which output images should be saved
--colorize		whether to colorize image before inpainting
--colorization_model	string	which model to use for colorization (huggingface id)
--upscaling_model	string	which model to use for upscaling, necessary for colorization (huggingface id)
--colorization_prompt	string	prompt for colorization, not necessary is datasheet_path and color are provided
--colorization_negative_prompt	string	negative prompt for colorization
--do_not_fill_holes		toggle hole filling for colorization mask, only relevant when colorizing
--dilation	int	how much to extend the mask for colorization, only relevant when colorizing
--colorization_strength	float	how much diffusion to apply for colorization, only relevant when colorizing
--colorization_prompt_guidance	float	how much classifier-free guidance to apply during colorization, only relevant when colorizing
--scale	float	how much to down or upscale the bike, higher values result in the bike taking up more space in frame, default is 1.
--fraction_down	float	relative y position of the (center of the) bike, higher values place the bike closer to the lower edge of the image, default is 0.5 (centered)
--fraction_right	float	relative x position of the (center of the) bike, higher values place the bike closer to the right edge of the image, default is 0.5 (centered)
--ckpt_id	string	id of the checkpoint to use for the creation of bike outlines from point clouds, only relevant in point mode
--bike_mask_dilation	int	how much to extend the masks generated from the point clouds, only relevant in point mode
--do_not_fill_bike_holes		whether to apply hole filling to bike masks, only relevant in point mode
--wheel_design	int	which wheel design to use for the generation of bike outlines, currently only 0 and 1 are implemented, only relevant in point mode

Interactive Generation

An additional script interactive.py provides an implementation to generate images interactively in a Human-in-the-Loop fashion. This means, at each step five options are generated in parallel and the user is asked to choose the best option. Then, only the chosen image is used for the next step.

No additional CLI arguments are accepted by the script the user will be prompted to make all decisions.

To run in interactive mode execute:

python interactive.py

Evaluation

The evaluation code used or our paper can be found under ./evaluation. Note that quantitative metrics were computed using the following CLI command:

python evaluate.py --exp_name "<experiment name>" --gen_file_path "<path to generated images>" --ref_file_path "<path to reference files>" --masks_path "<only used for composition, path to masks>"

Human-Evaluation metrics and inferential statistics were computed using the provided notebook.

Citation

If find our work helpful and want to use it for your research or project, please cite the paper as follows:

@misc{2407.10592,
Author = {Phillip Mueller and Jannik Wiese and Ioan Craciun and Lars Mikelsons},
Title = {InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture},
Year = {2024},
Eprint = {arXiv:2407.10592},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
_notebooks		_notebooks
csv		csv
evaluation		evaluation
test_images		test_images
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
interactive.py		interactive.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InsertDiffusion

Usage

Image mode

Point mode

Colorization

CLI arguments

Interactive Generation

Evaluation

Citation

About

Releases

Packages

Contributors 2

Languages

License

ragor114/InsertDiffusion

Folders and files

Latest commit

History

Repository files navigation

InsertDiffusion

Usage

Image mode

Point mode

Colorization

CLI arguments

Interactive Generation

Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages