This repository contains the code for our winning entry in PFL-DocVQA Competition 2023 (Track 2). Our contribution is very simple. We applied LoRA to the provided baseline. By reducing the no. of training parameters, our method significantly reduces both total communication costs and the overall noise added to the model during training. Below, we provide a simple and straightforward explanation of our method.
Authors: Ragul N[1]#, Sivasanjai G A[1], Rintu Kutum[1][2]*
Affiliations:
[1]Department of Computer Science, Ashoka University
[2]Trivedi School of Biosciences, Ashoka University
# First author
* Corresponding author. [email protected]
Mathematically, a randomized algorithm
Mathematically, a randomized algorithm A is ε-differentially private if, for all neighboring datasets D and D' (differing in a single data point), and for all possible outcomes
where
The Gaussian Mechanism is method for achiving diffrenital privacy. Given a function f, the Gaussian Mechanism adds noise sampled from a Gaussian distribution with standard diviation propotional to the sensitivity of f. The sensitivity measures how much the output of the function can change due to the addition or removal of a single data point.
Mathematically, for a function f with sensitivity Δf, the Gaussian Mechanism perturbs the function as follows:
where
Stochastic Gradient Descent (SGD) is a widely-used optimization algorithm for training machine learning models. The basic idea is to iteratively update the model parameters by moving in the direction of the negative gradient of the loss function. The update rule for a parameter
where
To make SGD differentially private, we introduce the Gaussian Mechanism to the gradient updates. The differentially private gradient update for parameter
Here, the
In federated learning, the objective is to train a global model across decentralized clients while preserving the privacy of individual data on each client. To achieve this, the combination of FedAvg and local differential privacy can be employed.
Local differential privacy focuses on injecting noise at the individual client level, providing privacy protection for local datasets. Each client independently applies differential privacy mechanisms to its local data before communicating with the central server. Mathematically, the local differential privacy mechanism for a client
Here,
Federated Averaging (FedAvg)
FedAvg is a federated learning algorithm that involves iterative model updates and aggregation across multiple clients. The basic steps of FedAvg are as follows:
-
Initialization: Initialize a global model on the central server.
-
Client Update: Randomly select a subset of clients, each denoted by
$C_i$ , where$i$ ranges over the selected clients. Each client in the subset performs a local update using its private data, generating a model update denoted by$\Delta \theta_i$ . -
Model Aggregation: The central server aggregates the model updates from the selected clients using a weighted average:
$$\Delta \theta = \frac{1}{|C|} \sum_{i \in C} w_i \Delta \theta_i $$ Here,
$w_i$ represents the weight assigned to each client in the subset, and$|C|$ is the size of the subset. -
Global Model Update: The central server updates the global model using the aggregated update:
$$\theta_{\text{global}} = \theta_{\text{global}} - \eta \Delta \theta $$ where
$\eta$ is the learning rate.
To introduce local differential privacy in FedAvg, we add noise at the client level during the model update step. Specifically, each client perturbs its local update using local DP-SGD before sending the model update to the central server.
The local DP-SGD update at client
Here,
Now, during the federated averaging step, the central server aggregates the locally perturbed updates from the selected clients using a weighted average:
The global model is then updated with the aggregated perturbed update:
This modification ensures that each client contributes to the global model in a privacy-preserving manner, incorporating local differential privacy through noise addition at the client level. The weights (w_i) can be adjusted to reflect the contribution of each client, considering factors such as data size or computation capabilities.
In Federated Learning, the communication cost can be a significant challenge. Clients communicate model updates to a central server during each round, leading to potential bandwidth issues, especially when dealing with numerous or resource-constrained clients.
The introduction of noise in DP machine learning to ensure privacy comes with a trade-off. The added noise, crucial for differential privacy, leads to performance degradation in the model. Balancing privacy and utility becomes a challenge.
Low Rank Adaptation (LoRA) is a parameter efficient fine-tuning technique that significantly reduces the number of trainable parameters needed for fine-tuning. For each pre-trained weight matrix
-
Reduced Communication Cost: LoRA reduces the number of parameters involved in model updates, leading to a decrease in the communication cost between clients and the central server in the federated learning setting.
-
Reduced Noise Addition: Differential privacy only requires noise addition to the model update parameters. By reducing size of the model update, LoRA also reduces the total noise added to the model.
We applied LoRA reparameterization only to the Transformer attention weights
After Applying LoRA, we used the following hyperparameters for training.
Noise Multiplier | 1.21 | 0.695 | 0.553 |
Sensitivity ( |
0.5 | 0.5 | 0.5 |
Clients per round ( |
2 | 2 | 2 |
Providers per client( |
45 | 45 | 45 |
Total Rounds( |
30 | 30 | 30 |
Delta( |
The application of LoRA yielded significant performance improvements across all three privacy budgets. The table below compares the accuracy achieved using LoRA against a baseline model:
Privacy Budget | LoRA | Baseline |
---|---|---|
57.4% | 46.2% | |
59.75% | 48.3% | |
60.4% | 50.3% |