Skip to content

Latest commit

 

History

History
123 lines (93 loc) · 7.04 KB

README.md

File metadata and controls

123 lines (93 loc) · 7.04 KB

SAFEFL: MPC-friendly framework for Private and Robust Federated Learning

This project implements several federated learning aggregation rules and attacks. We added support for linear regression on the HAR dataset.

Additionally, we implemented FLTrust and FedAvg in the MP-SPDZ Multi-Party Computation Framework.

The project is based on code by the authors of FLTrust and follows their general structure. The original code is available here and uses the machine learning framework MXNet. We adapted the existing code to use PyTorch and extended it.

Aggregation rules

The following aggregation rules have been implemented:

All aggregation rules are located in aggregation_rules.py as individual functions and operate on the local gradients and not on the actual local models. Working with the gradients or working with the models is equivalent as long as the global model is known. All aggregation rules that normally work on the local models have been modified to work on the local gradients instead.

To add an aggregation rule you can add the implementation in aggregation_rules.py. To actually use the aggregation rule during training you must also add a case for the aggregation rule in the main function of the main.py file. This calls the aggregation rule and must return the aggregated gradients.

Attacks

To evaluate the robustness of the aggregation rules we also added the following attacks.

The implementation of the attacks are all located in attacks.py as individual functions.

To add a new attack the implementation can simply be added as a new function in this file. For attacks that are called during the aggregation the signature of the function must be the same format as the other attacks. This is because the attack function call in the training process is overloaded and which attack is called is only determined during runtime. The attack name must also be added to the get_byz function in main.py. Attacks that only manipulate training data just need to be called before the training starts and don't need a specific signature.

Models

We implemented multiclass linear regression classifier.

The model is in a separate file in the models folder of this project.

To add models a new file containing a class that defines this classifier must be added. Additionally, in main.py the get_net function needs to be expanded to enable the selection of this model.

Datasets

We implemented the HAR dataset and as it is not implemented by PyTorch per default. It must be downloaded with the provided loading script in the data folder.

Adding a new dataset requires adding the loading to the load_data function in data_loading.py. This can either be simply done by adding an existing dataloader from PyTorch or requires custom data loading like in the case with the HAR dataset. Additionally, the size of the data examples and the number of classes need to be added to the get_shapes function to properly configure the model. Furthermore, the assign_data function needs to be extended to enable assigning the test and train data to the individual clients. Should the evaluation require running the new dataset with the scaling attack, which adds backdoor trigger patterns to the data examples the following functions also need to be extended:

  • scaling_attack_insert_backdoor
  • add_backdoor

Both of these are located in attacks.py.

Multi-Party Computation

To run the MPC Implementation the code for MP-SPDZ needs to be downloaded separately using the installation script mpc_install.sh. The following protocols are supported:

  • Semi2k uses 2 or more parties in a semi-honest, dishonest majority setting
  • SPDZ2k uses 2 or more parties in a malicious, dishonest majority setting
  • Replicated2k uses 3 parties in a semi-honest, honest majority setting
  • PsReplicated2k uses 3 parties in a malicious, honest majority setting

How to run?

The project can be simply cloned from git and then requires downloading the HAR dataset as described in the dataset section.

The project takes multiple command line arguments to determine the training parameters, attack, aggregation, etc. is used. If no arguments are provided the project will run with the default arguments. A description of all arguments can be displayed by executing:

python main.py -h

Requirements

The project requires the following packages to be installed:

  • Python 3.8.13
  • Pytorch 1.11.0
  • Torchvision 0.12.0
  • Numpy 1.21.5
  • MatPlotLib 3.5.1
  • HDBSCAN 0.8.28
  • Perl 5.26.2

All requirements can be found in the requirements.txt.

Credits

This project is based on code by Cao et al. the authors of FLTrust and is available here

We thank the authors of Romoa for providing an implementation of their aggregation.

We used the open-sourced implementations of the Min-Max and Min-Sum attack.

For the implementation of Flame we used the scikit-learn implementation of HDBSCAN by McInnes et al.

The MPC Framework MP-SPDZ was created by Marcel Keller.

License

MIT