- Obtain access to the MIMIC-CXR-JPG Database Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:
gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG
- In order to obtain gender information for each patient, you will need to obtain access to MIMIC-IV. Download
core/patients.csv.gz
andcore/admissions.csv.gz
and place the files in theMIMIC-CXR-JPG
directory.
-
Sign up with your email address here.
-
Download either the original or the downsampled dataset (we recommend the downsampled version -
CheXpert-v1.0-small.zip
) and extract it. -
Register for an account and download the CheXpert demographics data here.
-
In
cxr_fairness/data/Constants.py
, updateimage_paths
to point to the two directories that you downloaded, andCXP_details
to be the path to the CheXpert demographics file. -
Run
python -m cxr_fairness.data.preprocess.preprocess
. -
(Optional) If you are training a lot of models, it might be faster to cache all images to binary 224x224 files on disk. This is especially true if you are using non-downsized versions of the datasets. In this case, you should update the
cache_dir
path incxr_fairness/data/Constants.py
and then runpython -m cxr_fairness.data.preprocess.cache_data
, optionally parallelizing over--env_id {0, 1}
for speed. To use the cached files, pass--use_cache
totrain.py
.