FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping

[WACV 2023](Coming soon...)] [Video Results]

Abstract

In this work, we present a new single-stage method for subject agnostic face swapping and identity transfer, named FaceDancer. We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR). The AFFA module is embedded in the decoder and adaptively learns to fuse attribute features and features conditioned on identity information without requiring any additional facial segmentation process. In IFSR, we leverage the intermediate features in an identity encoder to preserve important attributes such as head pose, facial expression, lighting, and occlusion in the target face, while still transferring the identity of the source face with high fidelity. We conduct extensive quantitative and qualitative experiments on various datasets and show that the proposed FaceDancer outperforms other state-of-the-art networks in terms of identity transfer, while having significantly better pose preservation than most of the previous methods.

For a quick play around, you can check out a version of FaceDancer hosted on Hugging Face. The Space allow you to face swap images, but also try some other functionality I am currently researching, which I plan to publish soon. For example, reconstruction attacks and adversarial defense against the reconstruction attacks.

Getting Started

This project was implemented in TensorFlow 2.X. For evaluation we used models implemented in both TensorFlow and PyTorch (e.g CosFace from InsightFace).

Installation:

Here is an example of installing FaceDancer on Windows:

Clone or download repository

git clone https://github.com/felixrosberg/FaceDancer.git
cd FaceDancer

Make conda environment

conda create -n facedancer python=3.8
conda activate facedancer
python -m pip install --upgrade pip

Download and install Microsoft Visual C++ for Visual Studio 2015 (if you not installed it)
The easiest way to run FaceDancer on GPU is to install tensorflow-cpu and tensorflow-directml-plugin. if you need only a CPU, then the installation of tensorflow-directml-plugin can be skipped. To work with the GPU, the latest Nvidia driver must be installed.

pip install tensorflow-cpu==2.10
pip install tensorflow-directml-plugin

Install depencies:

pip install -r requirements.txt

An alternative installation method if you have difficulty with the previous:

Clone or download repository

git clone https://github.com/felixrosberg/FaceDancer.git
cd FaceDancer

Make conda environment

conda create -n facedancer python=3.8
conda activate facedancer
python -m pip install --upgrade pip

Install depencies:

conda install -c conda-forge cudatoolkit cudnn
pip install tensorflow-gpu
pip install -r requirements.txt

Models:

Download the pretrained ArcFace here (only ArcFace-Res50.h5 is needed for swapping) and RetinaFace here. Secondly you need to train FaceDancer or download a pretrained model weights from here.

Put ArcFace-Res50.h5 inside the ./arcface_model dir.
Put RetinaFace-Res50.h5 inside the ./retinaface dir.
Put downloaded pretrained models inside the ./model_zoo dir.

To swap all faces with one source, run:

Warning

Source image with too high resolution may not work properly!

Video:

python test_video_swap_multi.py --facedancer_path "./model_zoo/FaceDancer_config_c_HQ.h5" --vid_path "path/to/video.mp4" --swap_source "path/to/source_face.jpg" --vid_output "results/swapped_video.mp4"

Image:

python test_image_swap_multi.py --facedancer_path "./model_zoo/FaceDancer_config_c_HQ.h5" --img_path "path/to/image.jpg" --swap_source "path/to/source_face.jpg" --img_output "results/swapped_image.jpg"

The video or image with swapped faces will be saved in the ./results directory

Using the Models in Custom script

import logging

import cv2
import numpy as np
from PIL import Image
from tensorflow.keras.models import load_model
from tensorflow_addons.layers import InstanceNormalization

from networks.layers import AdaIN, AdaptiveAttention

logging.getLogger().setLevel(logging.ERROR)


model = load_model("path/to/model.h5", compile=False, custom_objects={"AdaIN": AdaIN,
                                                                      "AdaptiveAttention": AdaptiveAttention,
                                                                      "InstanceNormalization": InstanceNormalization})
arcface = load_model("path/to/arcface.h5", compile=False)

# target and source images need to be properly cropeed and aligned
target = np.asarray(Image.open("path/to/target_face.png").resize((256, 256)))
source = np.asarray(Image.open("path/to/source_face.png").resize((112, 112)))

source_z = arcface(np.expand_dims(source / 255.0, axis=0))

face_swap = model([np.expand_dims((target - 127.5) / 127.5, axis=0), source_z]).numpy()
face_swap = (face_swap[0] + 1) / 2
face_swap = np.clip(face_swap * 255, 0, 255).astype('uint8')

cv2.imwrite("./swapped_face.png", cv2.cvtColor(face_swap, cv2.COLOR_BGR2RGB))

Note

The important part is that you need ArcFace as well and make sure the target image is normalized between -1 and 1, and the source between 0 and 1.

How to Preprocess Data

Aligning Faces

Before you can train FaceDancer you must make sure the data is properly aligned and processed. Learning capabilites is crippled without this step, if not impossible. The expected folder structure is DATASET/subfolders/im_0, ..., im_x. If using an image dataset not divided into subfolders you can put the DATASET folder inside a parent folder like this: PARENT_FOLDER/DATASET/im_0, ..., im_x. Then specify the PARENT_FOLDER as the --data_dir and the DATASET will be treated as a subfolder. This step requires the pretrained RetinaFace for face detection and facial landmark extraction.

To align the faces run:

python dataset/crop_align.py --data_dir path/to/DATASET --target_dir path/to/processed_DATASET

Remaining arguments consist of:

--device_id, default=0 - Which device to use.
--im_size, default=256 - Final image size of the processed image.
--min_size, default=128 - Threshold to ignore image with a width or height lower than min_size.
--shrink_factor, default=1.0 - This argument controls how much of the background to keep. Default is 1.0 which produces images appropriate as direct input into ArcFace. If the shrink factor is e.g 0.75, you must center crop the image, keeping 0.75% of the image, before inputting into ArcFace.

Sharding the Data

This step will convert the image data to tfrecords. If using large datasets such as VGGFace2 this will take some time. However, the training code is designed around this step and it speeds up training significantly. The expected folder structure is DATASET/subfolders/im_0, ..., im_x. If using an image dataset not divided into subfolders you can put the DATASET folder inside a parent folder like this: PARENT_FOLDER/DATASET/im_0, ..., im_x. Then specify the PARENT_FOLDER as the --data_dir and the DATASET will be treated as a subfolder.

To shard the data run:

python dataset/dataset_sharding.py --data_dir path/to/DATASET --target_dir path/to/tfrecords/dir --data_name dataset_name

Remaining arguments consist of:

--data_type, default="train" - Identifier for the output file names.
--shuffle, default=True - Where to shuffle the order of sharding the images.
--num_shards, default=1000 - How many shards to divide the data into.

How to Train

After you have processed and sharded all your desired datasets, you can train a version of FaceDancer. You still need to the pretrained ArcFace here (both ArcFace-Res50.h5 and ArcFacePerceptual-Res50 is needed). Secondly you need to the expression embedding model used for a rough estimation here. Put the .h5 files into arcface_model/arcface and arcface_model/expface respectively and you should need to specify the path in arguments. The training scipt has the IFSR margins built-in into the default field of its argument. The training and validation data path uses a specific format: C:/path/to/tfrecords/train/DATASET-NAME_DATA-TYPE_*-of-*.records, where DATASET-NAME and DATA-TYPE is the arguments specified in the sharding. For example, DATASET-NAME=vggface2 and DATA-TYPE=train: C:/path/to/tfrecords/train/vggface2_train_*-of-*.records.

To train run:

python train/train.py --data_dir C:/path/to/tfrecords/train/dataset_train_*-of-*.records --eval_dir C:/path/to/tfrecords/val/dataset_val_*-of-*.records

You can monitor the training with tensorboard. The train.py script will automatically log losses and images into logs/runs/facdancer unless you specify a different log directory and/or log name (facedancer is the default log name). Checkpoints will automatically be saved into ./checkpoints directory unless you specify a different directory. The checkpointing saves the model structures to .json and the weights to .h5 files. If you want the complete model in a single .h5 file you can rerun train.py with --load XX and --export True. This will save the complete model as a .h5 file in exports/facedancer. XX is the checkpoint weight identifier, which can be found if you go to your checkpoints directory and for example, look up gen/gen_XX.h5.

PyTorch Implementation

Currently I am working on a PyTorch version of FaceDancer. The training and network code is kind of done. Currently the behaviour compare to TensorFlow is drastically different. Some interesting notes is that the mapping network does not allow for the FaceDancer to learn its task. In current state it provides decent results with the mapping network ommited. I will post the PyTorch version as soon as these issues is diagnosed and resolved.

Docker

build: docker build --rm -t faceswap .
run: docker run --gpus all --rm -it -p 8973:8000 -v $(pwd)/results:/workspace/results faceswap

License

FaceDancer is licensed under Attribution-NonCommercial-ShareAlike 4.0 International License

Citation

If you use this repository in your work, please cite us:

@inproceedings{Rosberg2023FaceDancer,
  title     = {FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping},
  author    = {F. Rosberg, E. Aksoy. C. Englund, F. Alonso-Fernandez},
  booktitle = {Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year      = {2023}
}

TODO:

Add complete code for calculating IFSR.
Add code for all evaluation steps.
Provide download links to pretrained models.
Image swap script.
Debugging?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping

Abstract

Getting Started

Installation:

Here is an example of installing FaceDancer on Windows:

An alternative installation method if you have difficulty with the previous:

Models:

To swap all faces with one source, run:

Source image with too high resolution may not work properly!

Video:

Image:

The video or image with swapped faces will be saved in the ./results directory

Using the Models in Custom script

The important part is that you need ArcFace as well and make sure the target image is normalized between -1 and 1, and the source between 0 and 1.

How to Preprocess Data

Aligning Faces

Sharding the Data

How to Train

PyTorch Implementation

Docker

License

Citation

TODO:

Files

README.md

Latest commit

History

README.md

File metadata and controls

FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping

Abstract

Getting Started

Installation:

Here is an example of installing FaceDancer on Windows:

An alternative installation method if you have difficulty with the previous:

Models:

To swap all faces with one source, run:

Source image with too high resolution may not work properly!

Video:

Image:

The video or image with swapped faces will be saved in the ./results directory

Using the Models in Custom script

The important part is that you need ArcFace as well and make sure the target image is normalized between -1 and 1, and the source between 0 and 1.

How to Preprocess Data

Aligning Faces

Sharding the Data

How to Train

PyTorch Implementation

Docker

License

Citation

TODO: