FaceCLIP

Pytorch implementation for Facial Expression Genaration from Text with CLIP. The goal is to generate realistic facial expression images from pure text, and also allow the user to sematiclly edit basic facial attributes using natural language descriptions to present complex attribute combinations, in one framework.

Overview

[Facial Expression Genaration from Text with FaceCLIP].

Data

Download the original data for AffectNet and RAF-DB, and save both into data/. Based on the two datasets, we built a new Facial Expressions and Texts (FET) dataset.
Preprocess the facial images from [AffectNet] dataset and save the images to data/affectnet/image/
Preprocess the facial images from [RAF-DB] dataset and save the images to data/affectnet/image/
```
 python crop_alignface/align_images.py  //crop and align facial expression images
```
Based on the annotated attribute excel data/affectnet/affect_attributes.xlsx, we generated 10 different texts per image. Download these texts from the link, and save them into data/affectnet/text/.
```
 python PCFG/Inside-Outside-Algorithm/train.py  //generate 10 different texts for each image in FET dataset
```

Training

All code was developed and tested on Linux with Python 3.7 (Anaconda) and torch '1.10.1+cu111'.

DAMSM model includes text encoder and image encoder

Pre-train DAMSM model for the FET dataset:

python pretrain_DAMSM.py --cfg cfg/DAMSM/.yml --gpu 0

FaceCLIP model

Train FaceCLIP model for the FET dataset:

python main.py --cfg cfg/train_affectnet.yml --gpu 1

*.yml files include configuration for training and testing.

Pretrained DAMSM Model

DAMSM for affectnet. Download and save it to DAMSMencoders/

Pretrained FaceCLIP Model

FaceCLIP for affectnet. Download and save it to models/ or `output/' (You may need to modify the path in the .yaml file to suit your situation)

Testing

Test ControlGAN model for FET dataset:

python main.py --cfg cfg/eval_affectnet.yml --gpu 2

Evaluation

To generate and save 30,000 images for all captions in the testing dataset, change B_VALIDATION to True in the eval_*.yml.
FID for the collected FET dataset: Frechet Inception Distance.
LIPIS for the collected FET dataset: PerceptualSimilarity.
R-precision is embedded in the testing process, i.e. when B_VALIDATION is True, the value of R-precision is calculated and output after 30,000 images have been generated.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
PCFG		PCFG
code		code
crop_alignface		crop_alignface
eval		eval
README.md		README.md
archi.png		archi.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FaceCLIP

Overview

Data

Training

DAMSM model includes text encoder and image encoder

FaceCLIP model

Pretrained DAMSM Model

Pretrained FaceCLIP Model

Testing

Evaluation

About

Releases

Packages

Languages

ourpubliccodes/FaceCLIP

Folders and files

Latest commit

History

Repository files navigation

FaceCLIP

Overview

Data

Training

DAMSM model includes text encoder and image encoder

FaceCLIP model

Pretrained DAMSM Model

Pretrained FaceCLIP Model

Testing

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages