Pytorch implementations of some general optimization methods in the federated learning community.
FedAvg: Communication-Efficient Learning of Deep Networks from Decentralized Data
FedProx: Federated Optimization in Heterogeneous Networks
FedAdam: Adaptive Federated Optimization
SCAFFOLD: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
FedDyn: Federated Learning Based on Dynamic Regularization
FedCM: FedCM: Federated Learning with Client-level Momentum
FedSAM/MoFedSAM: Generalized Federated Learning via Sharpness Aware Minimization
FedGamma: Fedgamma: Federated learning with global sharpness-aware minimization
FedSpeed: FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
FL-Simulator works on one single CPU/GPU to simulate the training process of federated learning (FL) with the PyTorch framework. If you want to train the centralized-FL with FedAvg method on the ResNet-18 and Cifar-10 dataset (10% active clients per round of total 100 clients, and heterogeneous dataset split is Dirichlet-0.6), you can use:
python train.py --non-iid --dataset CIFAR-10 --model ResNet18 --split-rule Dirichlet --split-coef 0.6 --active-ratio 0.1 --total-client 100
Other hyperparameters are introduced in the train.py file.
FL-Simulator pre-define the basic Server class and Client class, which are executed according to the vanilla
-
process_for_communication( ): how your method pre-processes the variables for communication to each client
-
postprocess( ): how your method post-processes the received variables from each local client
-
global_update( ): how your method processes the update on the global model
Then you can define a new client file or a new local optimizer for your own method to perform the local training. Similarly, you can directly define a new server class to rebuild the inner-operations.
We show some results of the ResNet-18-GN model on the CIFAR-10 dataset. The corresponding hyperparameters are stated in the following. The time costs are tested on the NVIDIA® Tesla® V100 Tensor Core.
CIFAR-10 (ResNet-18-GN) T=1000 | ||||||||||
10%-100 (bs=50 Local-epoch=5) | 5%-200 (bs=25 Local-epoch=5) | |||||||||
IID | Dir-0.6 | Dir-0.3 | Dir-0.1 | Time / round | IID | Dir-0.6 | Dir-0.3 | Dir-0.1 | Time / round | |
SGD basis | ||||||||||
FedAvg | 82.52 | 80.65 | 79.75 | 77.31 | 15.86s | 81.09 | 79.93 | 78.66 | 75.21 | 17.03s |
FedProx | 82.54 | 81.05 | 79.52 | 76.86 | 19.78s | 81.56 | 79.49 | 78.76 | 75.84 | 20.97s |
FedAdam | 84.32 | 82.56 | 82.12 | 77.58 | 15.91s | 83.29 | 81.22 | 80.22 | 75.83 | 17.67s |
SCAFFOLD | 84.88 | 83.53 | 82.75 | 79.92 | 20.09s | 84.24 | 83.01 | 82.04 | 78.23 | 22.21s |
FedDyn | 85.46 | 84.22 | 83.22 | 78.96 | 20.82s | 81.11 | 80.25 | 79.43 | 75.43 | 22.68s |
FedCM | 85.74 | 83.81 | 83.44 | 78.92 | 20.74s | 83.77 | 82.01 | 80.77 | 75.91 | 21.24s |
SAM basis | ||||||||||
FedGamma | 85.74 | 84.80 | 83.81 | 80.72 | 30.13s | 84.99 | 84.02 | 83.03 | 80.09 | 33.63s |
MoFedSAM | 87.24 | 85.74 | 85.14 | 81.58 | 29.06s | 86.27 | 84.71 | 83.44 | 79.02 | 32.45s |
FedSpeed | 87.31 | 86.33 | 85.39 | 82.26 | 29.48s | 86.87 | 85.07 | 83.94 | 79.66 | 33.69s |
FedSMOO | 87.70 | 86.87 | 86.04 | 83.30 | 30.43s | 87.40 | 85.97 | 85.14 | 81.35 | 34.80s |
The blank parts are awaiting updates.
Some key hyparameters selection
local Lr | global Lr | Lr decay | SAM Lr | proxy coefficient | client-momentum coefficiet | |
FedAvg | 0.1 | 1.0 | 0.998 | - | - | - |
FedProx | 0.1 | 1.0 | 0.998 | - | 0.1 / 0.01 | - |
FedAdam | 0.1 | 0.1 / 0.05 | 0.998 | - | - | - |
SCAFFOLD | 0.1 | 1.0 | 0.998 | - | - | - |
FedDyn | 0.1 | 1.0 | 0.9995 / 1.0 | - | 0.1 | - |
FedCM | 0.1 | 1.0 | 0.998 | - | - | 0.1 |
FedGamma | 0.1 | 1.0 | 0.998 | 0.01 | - | - |
MoFedSAM | 0.1 | 1.0 | 0.998 | 0.1 | - | 0.05 / 0.1 |
FedSpeed | 0.1 | 1.0 | 0.998 | 0.1 | 0.1 | - |
FedSMOO | 0.1 | 1.0 | 0.998 | 0.1 | 0.1 | - |
The hyperparameter selections above are for reference only. Each algorithm has unique properties to match the corresponding hyperparameters. In order to facilitate a relatively fair comparison, we report a set of selections that each method can perform well in general cases. Please adjust the hyperparameters according to changes in the different model backbones and datasets.
- Decentralized Implementation
- Delayed / Asynchronous Implementation
- Hyperparameter Selections
- Related Advances (Long-Term)
If this codebase can help you, please cite our papers:
FedSpeed (ICLR 2023):
@article{sun2023fedspeed,
title={Fedspeed: Larger local interval, less communication round, and higher generalization accuracy},
author={Sun, Yan and Shen, Li and Huang, Tiansheng and Ding, Liang and Tao, Dacheng},
journal={arXiv preprint arXiv:2302.10429},
year={2023}
}
FedSMOO (ICML 2023 Oral):
@inproceedings{sun2023dynamic,
title={Dynamic regularized sharpness aware minimization in federated learning: Approaching global consistency and smooth landscape},
author={Sun, Yan and Shen, Li and Chen, Shixiang and Ding, Liang and Tao, Dacheng},
booktitle={International Conference on Machine Learning},
pages={32991--33013},
year={2023},
organization={PMLR}
}