Choosing The Model:

The model chosen is a vector quantized (VQ) diffusion model based on these two papers:

Vector Quantized Diffusion Model for Text-to-Image Synthesis
Improved Vector Quantized Diffusion Models

Installing The Code Requirements:

There are three ways to run the project:

A .pynb file that was built to work with Google Colab.
A method through CLI.
A method through a web app.

Method number 1 only requires that the user uploads the file to Colab. Everything will run smoothly after installing the packages by simply running the first cell.
Methods number 2 and 3 require downloading some packages. These packages are in "Requirements.txt" file. You simply need to create an environment (preferrably using Anaconda) and do the following:

Choose Python 3.10.11
After the environement is created, open a terminal with this environment.
Copy each command in "Requirements.txt" to the terminal and run it.

Running The Training:

There are two ways to train:

Through Colab using the .ipynb file.
Through the given source code files.

To train using method 1:

Go to "configs/coco.yaml"
You can control all the configurations for training in this file. Feel free to leave it as it is.
Simply follow the steps in the notebook which include:
- Installing the packages.
- Cloning the repo.
- Downloading the dataset.
- Running the training file.

To train using method 2:

Create a folder called "datasets" in the root directory of the project.
Create a folder called "MSCOCO_Caption" in "datasets".
Follow the directory structure for Microsoft COCO Dataset in the "Data Preparing" section in "readme.md" file.
Download the dataset, choose:
- 2014 Train images
- 2014 Val iamges
- 2014 Train/Val annotations
"2014 Train images" is a compressed file containing a folder called "train 2014".
"2014 Val images" is a compressed file containing a folder called "val 2014".
"2014 Train/Val annotations" is a compressed file containing .JSON files. You only need two:
- "captions_train2014.json"
- "captions_val2014.json"
Go to "configs/coco.yaml"
You can control all the configurations for training in this file. Feel free to leave it as it is.

**Note: Training requires a powerful machine with lost of VRAM.

Samples:

Since training requires a very powerful system. I could not train using the original COCO 2014 dataset. I created a stripped down version.
The reason for this was to check that the training works. I also trained for one epoch. So, it goes without saying that the model will not produce good results.
Even the provided pretrained model was not trained for a lot of epochs.
In this section, I'm going to compare the outputs of my trained model and the pretrained model using the same propmt: "A group of elephants walking in muddy water"
There are six different inference methods which will also be shown.

Pretrained Inference VQ-Diffusion:

Pretrained Inference Improved VQ-Diffusion with learnable classifier-free sampling:

Pretrained Inference Improved VQ-Diffusion with high-quality inference:

Pretrained Inference Improved VQ-Diffusion with fast inference:

Pretrained Inference Improved VQ-Diffusion with purity sampling:

Pretrained Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

Custom Inference VQ-Diffusion:

Custom Inference Improved VQ-Diffusion with learnable classifier-free sampling:

Custom Inference Improved VQ-Diffusion with high-quality inference:

Custom Inference Improved VQ-Diffusion with fast inference:

Custom Inference Improved VQ-Diffusion with purity sampling:

Custom Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

Name & Link of The Training Set:

Name:

**COCO 2014

Download the dataset, choose:
- 2014 Train images
- 2014 Val iamges
- 2014 Train/Val annotations
"2014 Train images" is a compressed file containing a folder called "train 2014".
"2014 Val images" is a compressed file containing a folder called "val 2014".
"2014 Train/Val annotations" is a compressed file containing .JSON files. You only need two:
- "captions_train2014.json"
- "captions_val2014.json"

Number of Model Parameters:

This project contains two main models:

VQ-VAE
VQ-Diffusion

I trained the VQ-Diffusion model which contains:

content_codec: 65.8 million parameters
condition_codec: 0
transformer: 431.3 million parameters

These parameters add up to 497.1 million.

Model Evaluation Metric:

Variational Bayes loss is used in this project. To get this loss, Kullback-Leibler (KL) divergence is calculated.

Web App:

Streamlit was used to develop the web app for this project.
Once you start running the web app (check "Running" section below), it will start caching the model so that you only need to load them in once and not every time you need to infer.

Once the models are loaded and cached, you will be presented by this screen:
Once your text description and the number of images you want to generate, click on the "Generate" button:
After the image(s) have been generated, they will be displayed to you as shown:
The output image(s) are 256 X 256, you can choose to increase the resolution by clicking the "Increase Resolution" button:
The output image(s) will be 512 X 512 and will be displayed to you as shown:

Running:

To run the web app type:
streamlit run web_app.py
To run through CLI type:
python infer.py "your text decription" "number of images"

Example:
python infer.py "A group of elephants walking in muddy water" 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme_2.md

readme_2.md

Choosing The Model:

Installing The Code Requirements:

Running The Training:

Samples:

Pretrained Inference VQ-Diffusion:

Pretrained Inference Improved VQ-Diffusion with learnable classifier-free sampling:

Pretrained Inference Improved VQ-Diffusion with high-quality inference:

Pretrained Inference Improved VQ-Diffusion with fast inference:

Pretrained Inference Improved VQ-Diffusion with purity sampling:

Pretrained Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

Custom Inference VQ-Diffusion:

Custom Inference Improved VQ-Diffusion with learnable classifier-free sampling:

Custom Inference Improved VQ-Diffusion with high-quality inference:

Custom Inference Improved VQ-Diffusion with fast inference:

Custom Inference Improved VQ-Diffusion with purity sampling:

Custom Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

Name & Link of The Training Set:

Name:

Number of Model Parameters:

Model Evaluation Metric:

Web App:

Running:

Files

readme_2.md

Latest commit

History

readme_2.md

File metadata and controls

Choosing The Model:

Installing The Code Requirements:

Running The Training:

Samples:

Pretrained Inference VQ-Diffusion:

Pretrained Inference Improved VQ-Diffusion with learnable classifier-free sampling:

Pretrained Inference Improved VQ-Diffusion with high-quality inference:

Pretrained Inference Improved VQ-Diffusion with fast inference:

Pretrained Inference Improved VQ-Diffusion with purity sampling:

Pretrained Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

Custom Inference VQ-Diffusion:

Custom Inference Improved VQ-Diffusion with learnable classifier-free sampling:

Custom Inference Improved VQ-Diffusion with high-quality inference:

Custom Inference Improved VQ-Diffusion with fast inference:

Custom Inference Improved VQ-Diffusion with purity sampling:

Custom Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

Name & Link of The Training Set:

Name:

Number of Model Parameters:

Model Evaluation Metric:

Web App:

Running: