LoRA-Guard

Reference implementation for LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models. Paper by Hayder Elesedy, Pedro M. Esperança, Silviu Vlad Oprea, Mete Ozay. Implementation by Hayder Elesedy.

Abstract Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.

Installation:

Clone the repository.
Add your HuggingFace access token as an environment variable, see here for details.

Install the packages using conda/mamba:

conda env create -f environment.yml
conda activate lora-guard

Usage:

The scripts are:

train.py: Code for training LoRA-Guard on the BeaverTails Dataset
moderate.py: Example of generation/moderation dual usage of LoRA-Guard.

For information on the arguments of these scripts run python <script> -h.

Train

For explanation of the arguments to train.py see the script (python train.py -h). The arguments to accelerate are the gpu ids to run on (comma separated list), the number of them and the port for their communication.

The train script will produce an output folder with a config file, training/evaluation metrics and epoch checkpoints (for the LoRA adaptors only, because the chat model weights are frozen).

accelerate \
launch \
--gpu-ids=${gpu_ids} \
--multi-gpu \
--num-processes=${num_processes} \
--mixed-precision=bf16 \
--main_process_port=${port} \
train.py \
${hf_model_id} \
${output_folder} \
--epochs=${epochs} \
--per-device-batch-size=${per_device_batch_size} \
--learning-rate=${learning_rate} \
--eval-batch-size=${eval_batch_size} \
--seed=${seed} \
--gradient-accumulation-steps=${gradient_accumulation_steps} \
--lora-r=${lora_r} \
--lora-alpha=${lora_alpha}

Moderate

Basic script showing the dual-use (generation and moderation) capabilities of Lora-Guard. To run the example, do

python <path-to-config-file> <path-to-adaptor-checkpoint> --device-id <cuda-device-id>

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
data.py		data.py
environment.yml		environment.yml
lora_guard.py		lora_guard.py
metrics.py		metrics.py
moderate.py		moderate.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoRA-Guard

Installation:

Usage:

Train

Moderate

About

Releases

Packages

Contributors 2

Languages

License

SamsungLabs/lora-guard

Folders and files

Latest commit

History

Repository files navigation

LoRA-Guard

Installation:

Usage:

Train

Moderate

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages