Skip to content

toda-lab/AFT

Repository files navigation

AFT

This repository contains the implementation code for the AFT algorithm proposed in the paper 'Approximation-guided Fairness Testing through Discriminatory Space Analysis'.

In addition, the experimental results from the paper can be accessed via the following link: https://doi.org/10.5281/zenodo.13898828.

Contents

Files Overview

  • The folder Datasets: three datasets used to train the Classifier under Tests (CuTs).
  • The folder FairnessTestCases: twelve trained CuTs.
  • The folder FairnessTestMethods: state-of-the-art individual fairness testing algorithms for comparison.
  • The folder utils and the file aft.py: the implementation code of the AFT algorithm.
  • The file exp.py: the code to run experimental evaluation.

Requirements

The experiments were based on Python 3.8.10. We recommend using a virtual environment to run and test AFT, to avoid dependency conflicts between packages. We show two ways to create a virtual environment.

  • If you already have Python 3.8.10 installed, you can use venv to create a virtual environment.
  • If not, you can
    • install Python 3.8.10 use pyenv, which enables you to install and manage multiple different versions of Python, and then create a virtual environment using venv,
    • or use conda to directly create a virtual environment that already contains Python 3.8.10.

Using venv

Make sure Python 3.8.10 and venv are installed.

Create a virtual environment called env_aft:

python3 -m venv env_aft

Activate the environment:

source env_aft/bin/activate

Install required packages:

pip install -r requirements.txt   

Using conda

Make sure conda is installed.

Create a virtual environment called env_aft, and install Python 3.8.10 and pip in it:

conda create --name env_aft python=3.8.10 pip

Activate the environment:

conda activate env_aft

Install required packages:

pip install -r requirements.txt   

Usage

Use the following command to run AFT:

python exp.py [--dataset_name {Adult,Credit,Bank}] [--protected_attr {sex,race,age}] [--model_name {LogReg,RanForest,DecTree,MLP}] [--method {aft,vbtx,vbt,themis}] [--runtime RUNTIME] [--repeat REPEAT]

The possible values for each parameter are listed below:

  • dataset_name: Adult, Credit, Bank
  • protected_attr: sex, race, age
  • model_name: LogReg, RanForest, DecTree, MLP
  • method: aft, vbtx, vbt, themis
  • runtime: the running time in seconds (default=3600)
  • repeat: the number of repeated runs (default=30)

The folder FairnessTestCases contains 12 machine learning models prepared for fairness testing. The folder Datasets contains 3 training datasets, on which the 12 models were trained.

Example

As an example, the following command means to use AFT to perform a fairness testing on the configuration (Adult, sex, LogReg) within 60 seconds:

python exp.py --dataset_name Adult --protected_attr sex --model_name LogReg --method aft --runtime 60 --repeat 1

Outputs

Each run outputs two .csv files, located in two folders DiscData and TestData, respectively. The results contained in each folder are as follows:

  • DiscData: the results of detected discriminatory instances
  • TestData: the results of generated test cases

For the running example, suppose we obtain a result file DiscData/aft-LogReg-Adult-sex-60-0.csv, and the first six rows of this file are shown below:

$ head DiscData/aft-LogReg-Adult-sex-60-0.csv
8,2,0,14,4,3,4,2,0,1,40,61,19,0
8,2,0,14,4,3,4,2,1,1,40,61,19,1
4,1,63,15,6,2,0,0,0,17,5,10,32,0
4,1,63,15,6,2,0,0,1,17,5,10,32,1
3,2,60,5,0,13,5,1,0,5,15,15,28,0
3,2,60,5,0,13,5,1,1,5,15,15,28,1
6,6,71,10,0,7,4,0,0,1,10,15,39,0
6,6,71,10,0,7,4,0,1,1,10,15,39,1
2,3,69,2,2,1,1,2,0,2,0,84,34,0
2,3,69,2,2,1,1,2,1,2,0,84,34,1

For interpreting this file, a row represents an individual, and a pair of two consecutive rows represents a discriminatory instance. For example, the pair of row 1 and row 2 is a discriminatory instance, the pair of row 3 and row 4 is another discriminatory instance, and so on.

An individual (i.e., a row) is represented as a sequence of attribute values. The order of these attributes is the same as the order of the attributes of the training data. For example, you can find the list of attributes for Adult with the following command:

$ head -1 Datasets/Adult.csv
age,workclass,fnlwgt,education,martial_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,Class

How to cite

Please cite our paper ASE'24.

Bibtex:

@inproceedings{10.1145/3691620.3695481,
author = {Zhao, Zhenjiang and Toda, Takahisa and Kitamura, Takashi},
title = {Approximation-guided Fairness Testing through Discriminatory Space Analysis},
year = {2024},
isbn = {9798400712487},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3691620.3695481},
doi = {10.1145/3691620.3695481},
booktitle = {Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering},
pages = {1007–1018},
numpages = {12},
keywords = {machine learning, algorithmic fairness, fairness testing, decision tree, sampling algorithm},
location = {Sacramento, CA, USA},
series = {ASE '24}
}

License

AFT is licensed under The MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages