Skip to content

Latest commit

 

History

History
196 lines (126 loc) · 9.1 KB

README.md

File metadata and controls

196 lines (126 loc) · 9.1 KB

Painting with neural networks

This repository contains code associated with Bachelor's Thesis, developed by Jozef BARUT, with the supervision of Giang NGUYEN, doc., Ing., PhD. .

Project aims to train a generative deep learning model for image synthesis. Encapsulating the result into web page that can be interaced with easily dockerizing this application.

Machine learning

Machine learning part of this project has two models implemented.

One is a simple GAN implementation that you can use to get familiar with generative adversarial networks, how they are modeled to achieve image generation, their training process, etc. This is contained in the digits_gan folder.

Second is more complex optimized one inspired by TAC-GAN where the text encoding part of CLIP model is used as a pretrained text encoder. Folder coresponding with this is main_model.

Technical details

The excellent PyTorch deep learning library and it's submodules are used to implement both models. This library of course has the option to speed up the deep learning process on GPU's with Nvidia's CUDA library. You check if your GPU is CUDA capable here.

Make sure to have Python 3.10 installed, and before running any pip install commands, we highly recommend creating a Python virtual environment with python -m venv venv. You will need to activate it with: source venv/bin/activate.

To not clutter package installation, two distinct requirements.txt files are present, if you don't have a capable GPU, you can install dependencies with pip install -r requirements_cpu.txt, otherwise use pip install -r requirements_gpu.txt.

Both sub-folders of deep learning contain training_loop.py python script which can be run for training of given model. These scripts follow rules listed below.

  • scripts assume the working directory is sub-folder that it is in.
  • every file, such as plots, model checkpoints and reports is saved into the saved/ folder which is also automatically created.
  • to customized number of training epochs and checkpoints, you can adjust the variables in the begging of the scripts right after imports.
    • _checkpoints saves generator and discriminator with epoch number in the file name, as well as generates a row in the progress report plot.
  • to use GPU make sure you put 1 in the determine_device function call like so: device = determine_device(1)

Runnable scripts in the main_model sub-folder also expect a data folder to be present with a structured dataset in it. You can download the dataset from our goodle disk here. Then unzip the file into the main_model sub-folder.

Main Model

Overview of the models architecture can be seen on the image below:

Training was done on Oxford 102 Flowers dataset, with 10 text description for each image. Class labels were also utilized for auxiliary classification by the discriminator. We also made an 80%-20% train-test split, to test the generalization of model on unseen prompts and for zero_shot generation.

Parameters for training were:

  • 200 epochs,
  • latent vector size: 100
  • CLIP encoded text vector size: 512
  • 1044 features maps going into the generator 1024 for latent vector, with concatenated 20 for encoded text

Complete inner workings can be seen in models.py.

Results:

Evaluation

Evaluation of model from the technical standpoint was conducted with three metrics, Inception Score and FID were computed over the entire dataset, separately for training and test prompts. Clip Score, measuring similarity between prompt and generated image was computed over 256 random samples. We also computed IS and CLIP Score for datasets themselves as baseline.

Metric Train Data Test Data Train Synthetic Test Synthetic
Inception Score 3.9 3 2.75 2.74
FID - - 124.49 146.93
CLIP Score 29.34 29.52 25.0 25.14

Explore

Play around with explore notebook to see what images with what prompts you can generate. You can do that with a simple jupyter notebook command. Once your notebook tab is open in the browser, navigate to the notebook and run its cells.

Backend

For the backend of the web page application that visualizes generated images, a simple Flask application is used.

Only default route "/" with GET method is available. This route parses one argument called "input", which represents the text prompt to the model. Response is an array of 5 images encoded in base64 format.

For running the server, using docker compose or building a docker container is highly recommended. More about that in the Docker part.

Manual set-up requires these steps:

  1. Have python 3.10 installed

  2. Install requirements listed in requirements_cpu.txt

  3. Make sure you are running the command from the root of the repository, as this should be the programs working directory

  4. Run python3 -m flask --app backend/app run -p 3000

    option -p changes the port, as default port 5000 is often used, which would result in an error

  5. Default GET request will respond with Bad Request, because of the lack of "input" argument

    Try this link: http://127.0.0.1:3000/?input=red if that happens

Frontend

Frontend of the application was created with reactive framework called Vue.js to enable requesting and re-rendering of generated images without refreshing the whole page.

It is a simple Single page app, that has a text prompt and a "generate" button. Images are displayed after the first request to backend.

Running the frontend is the easiest using docker compose or building a docker container. More about that in the Docker part.

Manual set-up requires these steps:

  1. Have Node.js (v20.x or higher) with npm installed
  2. in the frontend/ folder execute npm install
  3. Run npm run dev after the installation is complete
  4. The app should be running right here: http://localhost:5173/

Docker

To streamline the deployment of the different parts of this project and to avoid architecture or OS incompatibilities, we use Docker and it's tools. Please make sure you Docker Engine installed.

The project is set up in two ways, that are somewhat building on each other.

Docker containers

There are two Dockerfiles present ./Dockerfile_backend, ./Dockerfile_frontend For both front and backend respectively. These provide a set of instructions for building their Docker Images.

To build the desired Docker Image you can run at the root repository: docker build . -t image_name -f ./Dockerfile_name. Option -f is used because the Dockerfiles are named differently then just "Dockerfile" which the command would automatically scan for.

Replacing the placeholers as an example docker build . -t frontend-docker -f ./Dockerfile_frontend

After the image is build, it can be run via:

  • docker run -p 8080:8080 frontend_image_name for frontend.
  • docker run -p 8001:5000 backend_image_name for backend.

The mapping of ports should be kept as is, because that is where the apps are exposed to your machine.

Docker Compose

With Docker Compose we can combine the building and running of multiple images with a single command and compose.yaml file, further automating the deployment process.

You will need to have Docker Compose plugin installed in addition to the Docker Engine.

When you have Compose installed, simply run docker compose up at the same directory as the yaml file. After everything is finished, both dockerized services should be up and running. You can check in the "Painting with Neural Nets" frontend app, if it is showing and requests are correctly resolving.