Merge pull request #237 from SymbioticLab/auxo

[Example] Auxo (SoCC'23)
SymbioticLab · Sep 23, 2023 · 731aa17 · 731aa17
2 parents faab283 + dabf26a
commit 731aa17
Show file tree

Hide file tree

Showing 25 changed files with 2,156 additions and 0 deletions.
diff --git a/benchmark/configs/auxo/auxo.yml b/benchmark/configs/auxo/auxo.yml
@@ -0,0 +1,52 @@
+# Configuration file of fed_hetero experiment
+
+# ========== Cluster configuration ==========
+# ip address of the parameter server (need 1 GPU process)
+ps_ip: localhost
+ps_port: 12345
+
+# ip address of each worker:# of available gpus process on each gpu in this node
+# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
+# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
+worker_ips:
+    - localhost:[7,7,0,0] # worker_ip: [(# processes on gpu) for gpu in available_gpus] eg. 10.0.0.2:[4,4,4,4] This node has 4 gpus, each gpu has 4 processes.
+
+exp_path: $FEDSCALE_HOME/examples/auxo
+
+# Entry function of executor and aggregator under $exp_path
+executor_entry: executor.py
+
+aggregator_entry: aggregator.py
+
+auth:
+    ssh_user: ""
+    ssh_private_key: ~/.ssh/id_rsa
+
+# cmd to run before we can indeed run FAR (in order)
+setup_commands:
+    - source $HOME/anaconda3/bin/activate fedscale
+
+# ========== Additional job configuration ==========
+# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found
+
+job_conf:
+    - job_name: auxo_femnist                     # Generate logs under this folder: log_path/job_name/time_stamp
+    - log_path: $FEDSCALE_HOME/benchmark    # Path of log files
+    - num_participants: 200                  # Number of participants per round, we use K=100 in our paper, large K will be much slower
+    - data_set: femnist                     # Dataset: openImg, google_speech, stackoverflow
+    - data_dir: $FEDSCALE_HOME/benchmark/dataset/data/    # Path of the dataset
+    - data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv              # Allocation of data to each client, turn to iid setting if not provided
+    - device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity     # Path of the client trace
+    - device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
+    - model: resnet18                       # NOTE: Please refer to our model zoo README and use models for these small image (e.g., 32x32x3) inputs
+#    - model_zoo: fedscale-torch-zoo
+    - eval_interval: 20                     # How many rounds to run a testing on the testing set
+    - rounds: 1000                          # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
+    - filter_less: 0                       # Remove clients w/ less than 21 samples
+    - num_loaders: 2
+    - local_steps: 10
+    - learning_rate: 0.05
+    - batch_size: 20
+    - test_bsz: 20
+    - use_cuda: True
+    - save_checkpoint: False
diff --git a/examples/auxo/Dockerfile b/examples/auxo/Dockerfile
@@ -0,0 +1,29 @@
+# Use an official CUDA image as a parent image
+FROM nvidia/cuda:11.0-base-ubuntu20.04
+
+# Set the working directory inside the container
+WORKDIR /app
+
+# Install necessary system packages
+RUN apt-get update && apt-get install -y python3.7 python3-pip
+
+# Create a virtual environment and activate it
+RUN python3.7 -m pip install virtualenv
+RUN python3.7 -m virtualenv venv
+RUN /bin/bash -c "source venv/bin/activate"
+
+# Copy the requirements file into the container
+COPY requirements.txt .
+
+# Install the Python dependencies
+RUN pip install --upgrade pip && pip install -r requirements.txt
+
+# Copy the project files into the container (assuming your project is in the current directory)
+COPY . .
+
+# Install your project using pip
+RUN pip install -e .
+
+# Command to run when the container starts
+CMD ["bash"]
+
diff --git a/examples/auxo/README.md b/examples/auxo/README.md
@@ -0,0 +1,72 @@
+
+
+<div align="center">
+<picture>
+  <img alt="Auxo logo" width="45%" src="fig/auxo.png">
+</picture> 
+<h1>Auxo: Efficient Federated Learning via Scalable Client Clustering</h1>
+
+</div>
+
+Auxo is a heterogeneity manager in Federated Learning (FL) through scalable and efficient cohort-based training mechanisms.
+For more details, refer to our academic paper on SoCC'23 [paper](https://arxiv.org/abs/2210.16656).
+
+
+## Key Features
+
+- **Scalable Cohort Identification**: Efficiently identifies cohorts even in large-scale FL deployments.
+
+- **Cohort-Based Training**: Optimizes the performance of existing FL algorithms by reducing intra-cohort heterogeneity.
+
+- **Resource Efficiency**: Designed to work in low-availability, resource-constrained settings without additional computational overhead.
+
+- **Privacy Preservation**: Respects user privacy by avoiding the need for traditional clustering methods that require access to client data.
+
+
+## Getting Started
+### Install
+Following the installation steps if you have not installed fedscale yet.
+```commandline  
+docker build -t fedscale:auxo .
+docker run --gpus all -it --name auxo -v $FEDSCALE_HOME:/workspace/FedScale fedscale:auxo /bin/bash
+```
+
+```
+echo export FEDSCALE_HOME=$(pwd) >> ~/.bashrc
+echo alias fedscale=\'bash ${FEDSCALE_HOME}/fedscale.sh\' >> ~/.bashrc
+source ~/.bashrc
+```
+
+### Prepare dataset
+After setting up the fedscale environment, you can download the dataset and partition each client dataset into train set and test set.
+
+```commandline
+fedscale dataset download femnist
+cd $FEDSCALE_HOME/examples/auxo 
+python -m utils.prepare_test_train ../../benchmark/dataset/data/femnist/client_data_mapping/train.csv
+python -m utils.prepare_test_train ../../benchmark/dataset/data/femnist/client_data_mapping/test.csv
+python -m utils.prepare_test_train ../../benchmark/dataset/data/femnist/client_data_mapping/val.csv
+```
+### Run Auxo
+```
+cd $FEDSCALE_HOME
+fedscale driver start benchmark/configs/auxo/auxo.yml 
+```
+
+### Visualize continuous clustering algorithm
+```commandline
+cd $FEDSCALE_HOME/examples/auxo
+python playground.py
+```
+Visualized clustering Results: 
+
+ <p float="left">
+  <img src="fig/epoch_14.png" width="150" />
+  <img src="fig/epoch_100.png" width="150" /> 
+  <img src="fig/epoch_224.png" width="150" />
+  <img src="fig/epoch_300.png" width="150" /> 
+  <img src="fig/epoch_500.png" width="150" /> 
+  <img src="fig/epoch_700.png" width="150" /> 
+</p>
+
+