This script attempts to reproduce the Midjourney Remix.
By default, the script produces remixed samples in the current directory. Note that this method requires the upstream version of the diffusers library.
python run.py /path/to/content_image.png /path/to/style_image.png
Here is a brief description of the final method. For research details, please refer to the research
directory.
- The Stable Diffusion v2-1-unclip model is used as it allows guiding reverse diffusion with CLIP image embeddings instead of text embeddings.
- The content image is forward-diffused to the specified
timestamp
to use as an initial latent vector. - Both the content and style images are encoded with the CLIP model, and their embeddings are averaged with the
alpha
parameter. - The reverse diffusion process is run with the initial latent vector of the content image and the averaged CLIP embeddings as guidance.
The most important hyperparameters are:
alpha
: determines how much the style image affects the diffusion process.timestamp
: determines how far the content image is diffused to use as an initial latent vector.num_inference_steps
: determines how many steps of the reverse diffusion process are run.