This repository contains code for training, testing, and evaluating the Safe-Responsible-LLM. The code is based on instruction fine-tuned models like Llama, Mistral; please update it to the appropriate versions of Llama and Mistral.
The content of the folders in this repository is as follows:
- Content-Moderation-Dataset: Includes the training dataset and the Counterfactual test set.
- LLM-Based Annotation Guideline: Provides datasets for testing iterations of desired prompts and a larger dataset for training models in line with the defined annotation and prompt format (as specified in the
params.py
file).
-
generate_data: Contains code for machine-augmented data annotation using OpenAI APIs and our defined prompts. Given the LLM's ability to annotate both accurately and quickly, these guidelines help annotate our schema through LLMs. The script uses either default parameters from the
params.py
file or parameters passed as terminal arguments. -
training: Contains code for fine-tuning the model based on the data in the
datasets
folder. The default parameters are defined in theparams.py
file, but users can pass other parameters as arguments to the script. Thetrain.py
file saves the trained adapter model within the directory where the code is stored. -
testing: Contains code for running inference on the merged (base + adapter) model and storing the results in the
datasets
folder. -
utils: Contains utility code, including:
configs.py
: Configuration dataclasses for moving parameters across methods.helpers.py
: Helper functions for loading files, merging models, etc.params.py
: Default parameters for training our model, which can be overwritten by defining arguments for the respective scripts.
You can access the dataset here: will be shared upon release.