Skip to content

Latest commit

 

History

History
197 lines (152 loc) · 11.7 KB

readme_2.md

File metadata and controls

197 lines (152 loc) · 11.7 KB

Choosing The Model:

The model chosen is a vector quantized (VQ) diffusion model based on these two papers:

Installing The Code Requirements:

There are three ways to run the project:

  1. A .pynb file that was built to work with Google Colab.
  2. A method through CLI.
  3. A method through a web app.

Method number 1 only requires that the user uploads the file to Colab. Everything will run smoothly after installing the packages by simply running the first cell.
Methods number 2 and 3 require downloading some packages. These packages are in "Requirements.txt" file. You simply need to create an environment (preferrably using Anaconda) and do the following:

  • Choose Python 3.10.11
  • After the environement is created, open a terminal with this environment.
  • Copy each command in "Requirements.txt" to the terminal and run it.

Running The Training:

There are two ways to train:

  1. Through Colab using the .ipynb file.
  2. Through the given source code files.

To train using method 1:

  • Go to "configs/coco.yaml"
  • You can control all the configurations for training in this file. Feel free to leave it as it is.
  • Simply follow the steps in the notebook which include:
    • Installing the packages.
    • Cloning the repo.
    • Downloading the dataset.
    • Running the training file.

To train using method 2:

  • Create a folder called "datasets" in the root directory of the project.
  • Create a folder called "MSCOCO_Caption" in "datasets".
  • Follow the directory structure for Microsoft COCO Dataset in the "Data Preparing" section in "readme.md" file.
  • Download the dataset, choose:
    • 2014 Train images
    • 2014 Val iamges
    • 2014 Train/Val annotations
  • "2014 Train images" is a compressed file containing a folder called "train 2014".
  • "2014 Val images" is a compressed file containing a folder called "val 2014".
  • "2014 Train/Val annotations" is a compressed file containing .JSON files. You only need two:
    • "captions_train2014.json"
    • "captions_val2014.json"
  • Go to "configs/coco.yaml"
  • You can control all the configurations for training in this file. Feel free to leave it as it is.

**Note: Training requires a powerful machine with lost of VRAM.

Samples:

Since training requires a very powerful system. I could not train using the original COCO 2014 dataset. I created a stripped down version.
The reason for this was to check that the training works. I also trained for one epoch. So, it goes without saying that the model will not produce good results.
Even the provided pretrained model was not trained for a lot of epochs.
In this section, I'm going to compare the outputs of my trained model and the pretrained model using the same propmt: "A group of elephants walking in muddy water"
There are six different inference methods which will also be shown.

Pretrained Inference VQ-Diffusion:

alt text alt text alt text alt text

Pretrained Inference Improved VQ-Diffusion with learnable classifier-free sampling:

alt text alt text alt text alt text

Pretrained Inference Improved VQ-Diffusion with high-quality inference:

alt text alt text alt text alt text

Pretrained Inference Improved VQ-Diffusion with fast inference:

alt text alt text alt text alt text

Pretrained Inference Improved VQ-Diffusion with purity sampling:

alt text alt text alt text alt text

Pretrained Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

alt text alt text alt text alt text

Custom Inference VQ-Diffusion:

alt text alt text alt text alt text

Custom Inference Improved VQ-Diffusion with learnable classifier-free sampling:

alt text alt text alt text alt text

Custom Inference Improved VQ-Diffusion with high-quality inference:

alt text alt text alt text alt text

Custom Inference Improved VQ-Diffusion with fast inference:

alt text alt text alt text alt text

Custom Inference Improved VQ-Diffusion with purity sampling:

alt text alt text alt text alt text

Custom Inference Improved VQ-Diffusion with both learnable classifier-free sampling and fast inference:

alt text alt text alt text alt text

Name & Link of The Training Set:

Name:

**COCO 2014

  • Download the dataset, choose:
    • 2014 Train images
    • 2014 Val iamges
    • 2014 Train/Val annotations
  • "2014 Train images" is a compressed file containing a folder called "train 2014".
  • "2014 Val images" is a compressed file containing a folder called "val 2014".
  • "2014 Train/Val annotations" is a compressed file containing .JSON files. You only need two:
    • "captions_train2014.json"
    • "captions_val2014.json"

Number of Model Parameters:

This project contains two main models:

  • VQ-VAE
  • VQ-Diffusion

I trained the VQ-Diffusion model which contains:

  • content_codec: 65.8 million parameters
  • condition_codec: 0
  • transformer: 431.3 million parameters

These parameters add up to 497.1 million.

Model Evaluation Metric:

Variational Bayes loss is used in this project. To get this loss, Kullback-Leibler (KL) divergence is calculated.

Web App:

Streamlit was used to develop the web app for this project.
Once you start running the web app (check "Running" section below), it will start caching the model so that you only need to load them in once and not every time you need to infer.

  • Once the models are loaded and cached, you will be presented by this screen:
    alt text

  • Once your text description and the number of images you want to generate, click on the "Generate" button:
    alt text

  • After the image(s) have been generated, they will be displayed to you as shown:
    alt text

  • The output image(s) are 256 X 256, you can choose to increase the resolution by clicking the "Increase Resolution" button:
    alt text

  • The output image(s) will be 512 X 512 and will be displayed to you as shown:
    alt text alt text

Running:

  • To run the web app type:
    streamlit run web_app.py

  • To run through CLI type:
    python infer.py "your text decription" "number of images"

    Example:
    python infer.py "A group of elephants walking in muddy water" 4