This repository contains code associated with Bachelor's Thesis, developed by Jozef BARUT, with the supervision of Giang NGUYEN, doc., Ing., PhD. .
Project aims to train a generative deep learning model for image synthesis. Encapsulating the result into web page that can be interaced with easily dockerizing this application.
Machine learning part of this project has two models implemented.
One is a simple GAN implementation that you can use to get familiar with generative adversarial networks, how they are modeled to achieve image generation, their training process, etc. This is contained in the digits_gan folder.
Second is more complex optimized one inspired by TAC-GAN where the text encoding part of CLIP model is used as a pretrained text encoder. Folder coresponding with this is main_model.
The excellent PyTorch deep learning library and it's submodules are used to implement both models. This library of course has the option to speed up the deep learning process on GPU's with Nvidia's CUDA library. You check if your GPU is CUDA capable here.
Make sure to have Python 3.10 installed, and before running any pip install
commands, we highly recommend
creating a Python virtual environment with python -m venv venv
.
You will need to activate it with: source venv/bin/activate
.
To not clutter package installation, two distinct requirements.txt files are present,
if you don't have a capable GPU, you can install dependencies with
pip install -r requirements_cpu.txt
, otherwise use pip install -r requirements_gpu.txt
.
Both sub-folders of deep learning contain training_loop.py python script which can be run for training of given model. These scripts follow rules listed below.
- scripts assume the working directory is sub-folder that it is in.
- every file, such as plots, model checkpoints and reports is saved into the
saved/
folder which is also automatically created. - to customized number of training epochs and checkpoints, you can adjust the variables in the
begging of the scripts right after imports.
- _checkpoints saves generator and discriminator with epoch number in the file name, as well as generates a row in the progress report plot.
- to use GPU make sure you put 1 in the determine_device function call like so:
device = determine_device(1)
Runnable scripts in the main_model sub-folder also expect a data folder to be present with a structured dataset in it. You can download the dataset from our goodle disk here. Then unzip the file into the main_model sub-folder.
Overview of the models architecture can be seen on the image below:
Training was done on Oxford 102 Flowers dataset, with 10 text description for each image. Class labels were also utilized for auxiliary classification by the discriminator. We also made an 80%-20% train-test split, to test the generalization of model on unseen prompts and for zero_shot generation.
Parameters for training were:
- 200 epochs,
- latent vector size: 100
- CLIP encoded text vector size: 512
- 1044 features maps going into the generator 1024 for latent vector, with concatenated 20 for encoded text
Complete inner workings can be seen in models.py.
Results:
Evaluation of model from the technical standpoint was conducted with three metrics, Inception Score and FID were computed over the entire dataset, separately for training and test prompts. Clip Score, measuring similarity between prompt and generated image was computed over 256 random samples. We also computed IS and CLIP Score for datasets themselves as baseline.
Metric | Train Data | Test Data | Train Synthetic | Test Synthetic |
---|---|---|---|---|
Inception Score | 3.9 | 3 | 2.75 | 2.74 |
FID | - | - | 124.49 | 146.93 |
CLIP Score | 29.34 | 29.52 | 25.0 | 25.14 |
Play around with explore notebook to see what images with what prompts you can
generate. You can do that with a simple jupyter notebook
command. Once your notebook tab is open in the browser, navigate to the notebook and run its cells.
For the backend of the web page application that visualizes generated images, a simple Flask application is used.
Only default route "/" with GET method is available. This route parses one argument called "input", which represents the text prompt to the model. Response is an array of 5 images encoded in base64 format.
For running the server, using docker compose or building a docker container is highly recommended. More about that in the Docker part.
Manual set-up requires these steps:
-
Have python 3.10 installed
-
Install requirements listed in requirements_cpu.txt
-
Make sure you are running the command from the root of the repository, as this should be the programs working directory
-
Run
python3 -m flask --app backend/app run -p 3000
option -p changes the port, as default port 5000 is often used, which would result in an error
-
Default GET request will respond with Bad Request, because of the lack of "input" argument
Try this link: http://127.0.0.1:3000/?input=red if that happens
Frontend of the application was created with reactive framework called Vue.js to enable requesting and re-rendering of generated images without refreshing the whole page.
It is a simple Single page app, that has a text prompt and a "generate" button. Images are displayed after the first request to backend.
Running the frontend is the easiest using docker compose or building a docker container. More about that in the Docker part.
Manual set-up requires these steps:
- Have Node.js (v20.x or higher) with npm installed
- in the frontend/ folder execute
npm install
- Run
npm run dev
after the installation is complete - The app should be running right here: http://localhost:5173/
To streamline the deployment of the different parts of this project and to avoid architecture or OS incompatibilities, we use Docker and it's tools. Please make sure you Docker Engine installed.
The project is set up in two ways, that are somewhat building on each other.
There are two Dockerfiles present ./Dockerfile_backend, ./Dockerfile_frontend For both front and backend respectively. These provide a set of instructions for building their Docker Images.
To build the desired Docker Image you can run at the root repository:
docker build . -t image_name -f ./Dockerfile_name
.
Option -f is used because the Dockerfiles are named differently then just "Dockerfile"
which the command would automatically scan for.
Replacing the placeholers as an example docker build . -t frontend-docker -f ./Dockerfile_frontend
After the image is build, it can be run via:
docker run -p 8080:8080 frontend_image_name
for frontend.docker run -p 8001:5000 backend_image_name
for backend.
The mapping of ports should be kept as is, because that is where the apps are exposed to your machine.
With Docker Compose we can combine the building and running of multiple images with a single command and compose.yaml file, further automating the deployment process.
You will need to have Docker Compose plugin installed in addition to the Docker Engine.
When you have Compose installed, simply run docker compose up
at the same directory
as the yaml file. After everything is finished, both dockerized services should be up and running.
You can check in the "Painting with Neural Nets" frontend app,
if it is showing and requests are correctly resolving.