Skip to content

Latest commit

 

History

History
59 lines (44 loc) · 1.61 KB

README.md

File metadata and controls

59 lines (44 loc) · 1.61 KB

VFL playground

This repo contains a trial of vertical federated learning (VFL) where the data holders do not train neural networks on their own devices.

Why?

The purpose of this is to demonstrate that VFL is a useful paradigm even for "simple" problems for which neural networks are not required. In this demo, each data holder owns some part of the "Titanic" dataset, which is simple enough to achieve high accuracies even with O(100) datapoints. Each data holder trains a logistic regression model on their part of the dataset. They send their predictions to a centralised computational server, which trains a neural network on the concatenation of the outputs from each data holder in order to better predict labels for the datapoints. The idea behind this process is that data holders will perform differently relative to one another based on the specific characteristics of their own data. Mapping these outputs to the more correct function of the data is a non-linear process (hence why we need the neural networks!)

Get started

Python

This demo has been coded using python 3.8, but similar minor versions will work.

Environment

Very simple - only a few packages required (and no GPUs!). Run pip install -r requirements.txt to install necessary packages.

How to run

Run main.sh. This trains a model in a centralised setting and then a model in the VFL setting.

Alternatively, execute python scripts/run_(de)centralised.py, where (de) is optional, to run one of the two scripts on its own.

Security implications

Incoming

License

Apache 2.0. See the license for more information.