This repository contains the Python code to reproduce the results in our paper Selection by Prediction with Conformal p-values.
The simulation in the paper was run with Python 3. The following Python packages are required to be installed: numpy
, pandas
, sklearn
.
simulations/
: bash file for running the simulations in batch.utils/
: Python codes for the simulations.results/
: store all the experiment outputs, will be automatically created if this directory does not exist.
Calling the file simu.py
executes one run of the simulation. It takes five inputs: --sig
from 1 to 10 corresponds to the noise strength --nt_id
from 1 to 4 corresponds to test sample sizes 10, 100, 500, 1000, --set_id
from 1 to 8 corresponds to the eight data generating processes in the paper (Table 2), --q
from 1, 2, 5 corresponds to FDR level 0.1, 0.2 and 0.5, --seed
from 1 to 1000 is the random seed used in this run.
It iterates over all the three machine learning algorithm (gbr
, rf
and svm
) in the paper and three nonconformity scores (BH_res
, BH_rel
, BH_clip
) in one single run.
For example, to execute a single run of the experiment for noise strength 0.4, test sample size 100 in setting 7, with FDR level 0.1 and random seed 53, simple run the following script:
cd simulations
python3 simu.py 4 2 7 1 53
The simulations can also be submitted in a batch mode on computing clusters, using the bash file in bash/
folder (may need modification according to the configurations of the computing clusters).
The curret bash file runs --sig
from 1 to 10, --nt_id
from 1 to 4, --q
in {1,2,5}, and --seed
from 1 to 100. To submit these jobs, direct to bash/
folder and run
sh bash.sh
These parameters can be edited.