- Dependencies for Local Install
- Installation
- Docker Build
- Repository Structure
- Data Organization Definitions
- Usage
- Apptainer
This pipeline requires 4dfp, fsl, freesurfer, and connectome workbench.
To install the dependencies, you can use the install_
scripts found in the tools
folder of this repo. If you
already have these dependencies installed, you can skip this step.
NOTE: At minimum, you may want to run the
install_4dfp.sh
script to install 4dfp, as it has some modifications to make it compatible with more modern linux systems.
There are dependencies for each of these tools that may need to be installed separately, and you may need to refer to
the Dockerfile
and/or the appropriate software's documentation for more details on how to install them.
After installing each dependency. source the tools/setenv.sh
script to setup the appropriate environment variables
for each package.
The 4dfp install script is located in tools/install_4dfp.sh
. This script will download and compile 4dfp under
tools/pkg
and place the compiled binaries and scripts under tools/bin
.
NOTE: The installer contains fixes for GCC >10 compatibility, if you are using an older version of GCC you can call the script with
1
as the first argument (e.g../install_4dfp.sh 1
) to disable GCC >10 flags.
The fsl install script is located in tools/install_fsl.sh
. This script will download and install fsl under
tools/pkg/fsl
.
NOTE:: You need a freesurfer
license.txt
in thetools
directory before running the install script.
The freesurfer install script is located in tools/install_freesurfer.sh
. This script will download and install
freesurfer under tools/pkg/freesurfer
.
The connectome workbench install script is located in tools/install_workbench.sh
. This script will download and
install connectome workbench under tools/pkg/workbench
.
The tools directory contains an install script for the NORDIC software. It will compile
the NORDIC scripts into a compatible MATLAB MCR executable. To install it, simply run
the install_nordic.sh
script in the tools
directory. This will download and build
NORDIC under tools/pkg/nordic
.
NOTE: The NORDIC MCR compilation requires MATLAB 2017b or newer to compile. You will also need to have the MATLAB compiler license installed on your machine.
The MATLAB compiler runtime install script is located in tools/install_mcr.sh
. This script will download and
install the MATLAB compiler runtime under tools/pkg/mcr
.
NOTE: You should run this after installing NORDIC, as the NORDIC installer sets your MATLAB_VERSION environment variable to the version of MATLAB that it was compiled with. This will ensure the correct version of the MCR is installed.
There are a few other system dependencies that are required for this pipeline to run. These include:
jq
tcsh
python3 # >= 3.7
gawk
wish
This repo currenly only works in editable mode (with strict enabled):
python3 -m pip install -e /path/to/repo/ -v --config-settings editable_mode=strict
A docker build is currently available on DockerHub.
To build the docker image, you can run:
docker buildx build . -t ghcr.io/dosenbachgreene/me_pipeline
Alternatively, if you have docker compose installed, you can run:
docker compose build
Both will do a multi-stage build of the docker image with the tag ghcr.io/dosenbachgreene/me_pipeline
.
The docker build makes the run_pipline
script the entrypoint. An example invocation for the help is given below:
docker run \
-u $(id -u):$(id -g) -it --rm \
-v $(pwd -P)/..:/data \
-v /tmp:/tmpdir \
vanandrew/me_pipeline:2023.4.4 -h
usage: run_pipeline [-h] {structural,functional,params} ...
TODO
options:
-h, --help show this help message and exit
pipeline:
{structural,functional,params}
pipeline to run
structural Structural Pipeline
functional Functional Pipeline
params Generate params file
Vahdeta Suljic <[email protected]>, Andrew Van <[email protected]> 12/09/2022
See the Usage section for more details on how to run the pipeline.
This section is for developers. Skip this section if you aren't intending to push any code changes.
The repo is organized as follows:
-
me_pipeline
: contains the main pipeline scripts, and associated pipeline wrappers.me_pipeline/scripts
holds python scripts that define a single function (.i.e.main
) that is installed as a script during installation. The if the file name isscript_to_call.py
, then after installation of this package, it can be simply called from the command line asscript_to_call
.me_pipeline/scripts/bin
andme_pipeline/scripts/data
holds shell scripts and reference data from the original pipeline that are called by the python scripts inme_pipeline/scripts
. -
extern
: for external git repos used in this pipeline. Currently the only one in use is for NORDIC. -
tools
: contains scripts for installing external dependencies. These include 4dfp, fsl, freesurfer, connectome workbench, and the MATLAB compiler runtime.
MR data has many terms that often get used interchangeably. Here are some definitions to help clarify the terms used (as well as the preferred BIDS terminology for each concept):
-
Project: A project is a collection of subjects. A project may have multiple subjects. May also be referred to as a Dataset or Study (note that this can easily be confused with the (lower-case) study under each Session that actually means Scans, this README will stray away from using such terminology to avoid confusion). In BIDS terminology the preferred term is Dataset.
-
Subject: A subject is a person who is being scanned. A subject may have multiple sessions. May also be referred to as a Participant. In BIDS terminology the preferred term is Participant and is prefixed by
sub-
. -
Session: A session refers to a subjects's scanning session or visit. A session will almost definitely have multiple scans. May also be referred to as a Visit or Experiment. In BIDS terminology the preferred term is Session and is prefixed by
ses-
. -
Scan: A scan refers to a single acquisition, generally resulting in a single image (Note that images are either 3D or 4D if also acquired temporally over time). May also be referred to as a Run. In this pipeline, you may also see it referenced to (albeit confusingly) as a study. In BIDS terminology the preferred term is a Run and is prefixed by
run-
.
There are several levels of usage for this pipeline:
-
Level 0: csh scripts. NOT FOR THE FAINT OF HEART. The old, regular way of running the pipeline. Create your own param files and call the appropriate csh scripts (the usual names). If you're doing it this way you are probably an expert and don't need this README. The main benefit this version of the pipeline provides is that it can be deployed to any compute environment (not just the NIL servers).
-
Level 1: BIDS based processing. See below.
-
Level 2: Web Interface. (TODO: NOT YET IMPLEMENTED).
This is for Level 0 users. Skip this section if you are not using this level of usage.
To run any of the csh scripts, you can use the run_script
command:
run_script Structural_pp_090121.csh [struct.params] [instructions.params]
To see the full list of scripts you can run, check run_script --help
.
The instructions.params
file has slightly different keys from the original. See the
params file below for more information:
set bids = 0 # When using the run_script program, bids mode must be turned off.
set cleanup = 0
set economy = 0
set inpath = $cwd
set target = $REFDIR/TRIO_Y_NDC
set outspace_flag = mni2mm
set nlalign = 0
set medic = 0 # 0 = no medic, 1 = medic
set num_cpus = 8 # number of cpus to use for parallel processing, replaces OSResample_parallel
set delta = 0.00246
set ME_reg = 1
set dbnd_flag = 1
set isnordic = 1
set runnordic = 1
set noiseframes = 0
set bases = /not/working/please/ignore/FNIRT_474_all_basis.4dfp.img # these are broken atm
set mean = /not/working/please/ignore/FNIRT_474_all_mean.4dfp.img # these are broken atm
set nbases = 5
set niter = 5
set GetBoldConfig = 1
set skip = 0
set normode = 0
set BiasField = 0
set useold = 1
set FCdir = FCmaps
set ncontig = 3
set FDthresh = 0.08
set DVARsd = 3.5
set DVARblur = 10.0
set bpss_params = ( -bl0.005 -ol2 -bh0.1 -oh2 )
set blur = 1.4701
set lomotil = 0
set Atlas_ROIs = 0
set surfsmooth = 1.7
set subcortsmooth = 1.7
set CSF_excl_lim = 0.15
set CSF_lcube = 4
set CSF_svdt = 0.15
set WM_lcube = 3
set WM_svdt = 0.15
set nRegress = 20
set min_frames = 50
set ROIdir = $REFDIR/CanonicalROIsNP705
set ROIimg = CanonicalROIsNP705_on_MNI152_2mm.4dfp.img
The other params files are the same as the original.
NOTE: Existing params files should be compatible with this version of the pipeline. However, it is recommended that you use the new params (medic, num_cpus, etc.) as the old params (OSResample_parallel, etc.) will be deprecated in the future.
For convenience, this repo provides a command line tool for auto-downloading data from CNDA:
download_dataset [base_dir] [project_name] [subject_id] [experiement_id] --skip_dcm_sort
where project_name
, subject_id
, and experiment_id
are the XNAT project, subject, and experiment IDs,
respectively. This script will create the data at base_dir/[name_of_archive]/SCANS
directory.
To convert to your DICOMs to a BIDS Dataset, use the convert_to_bids
command:
convert_to_bids --files /path/to/archive/SCANS/ -s [subject_id] -ss [session_id] -o /path/to/output/project -c dcm2niix -b --overwrite
This will create a bids dataset at /path/to/output/project
with the subject label [subject_id]
and session label
[session_id]
. The -b
flag will generate additional BIDS metadata automatically. The
--overwrite
flag will overwrite any existing files in the output directory.
NOTE: The BIDS conversion is built off of manually encoded heuristics searching for specific DICOM tags. It may fail if it encounters a scan it has not seen before. If this happens, contact Andrew or Vahdeta.
To run the pipeline, you can invoke the run_pipeline
command:
run_pipeline -h
usage: run_pipeline [-h] {structural,functional,params} ...
TODO
optional arguments:
-h, --help show this help message and exit
pipeline:
{structural,functional,params}
pipeline to run
structural Structural Pipeline
functional Functional Pipeline
params Generate params file
Vahdeta Suljic <[email protected]>, Andrew Van <[email protected]> 12/09/2022
The run_pipeline
has three subcommands: structural
, functional
, and params
. The
structural
and functional
run the stuctural and functional pipelines respectively,
while params
allows you to generate a params.toml file to configure the pipeline.
NOTE: These params files are different from the old style params file. The new params file is a TOML file that replaces the functionality of the old instructions.params file. It has support for various data types, comments, and nested tables for future expansion.
To generate a params file, use run_pipeline params
:
run_pipeline params /path/to/params.toml
This will generate a params file at /path/to/params.toml
. You can then edit the params
file to configure the pipeline.
# use bids mode (unless you know what you're doing, this should always be true)
bids = true
# Delete intermediary files for significant data storage improvement
cleanup = false
# controls saving of intermediary files
economy = 0
# atlas-representation target in 711-2B space
target = "$REFDIR/TRIO_Y_NDC"
# final fMRI data resolution and space
outspace_flag = "mni2mm"
# if set script will invoke fnirt
nlalign = false
# use MEDIC (Multi-Echo DIstortion Correction)
medic = true
# number of threads/processes to use
num_cpus = 8
# and more options ...
Loading the params file is done by passing the --config
flag to the functional/strutural
subcommands of the run_pipeline
command.
To run the structural pipeline, use run_pipeline structural
:
run_pipeline structural [bids_dir]
This will read in subjects from the BIDS dataset at [bids_dir]
and run the structural pipeline on each subject. By
default, outputs are written out to [bids_dir]/derivatives/me_pipeline
as per the BIDS specification. To change the
output directory, use the --output_dir
flag.
To only process certain subjects, use the --participant_label
flag.
NOTE: At the moment, the pipeline auto searches for T1w and T2w images across all sessions for a subject and processes them as a single average T1w and T2w image. If you need to process anatomical sessions separately, the easiest way at the moment is to create a separate BIDS dataset for each anatomical session.
The option to process anatomical sessions separately will be added in the future.
It is possible to load a params file to configure the pipeline with the --config
flag.
NOTE: The functional pipeline requires outputs from the structural pipeline to completely run.
To run the functional pipeline, use the run_pipeline functional
command:
run_pipeline functional [bids_dir]
Like the structural pipeline, this will read in subjects from the BIDS dataset at [bids_dir]
and run the functional
pipeline on each subject, session, run. By default, outputs are written out to [bids_dir]/derivatives/me_pipeline
and
can be changed with the --output_dir
flag.
To see which runs map to which file, you can look at the runs.json
file. Located in every session output folder
(e.g. [bids_dir]/derivatives/me_pipeline/sub-[subject_id]/ses-[session_id]
), this file contains a mapping of bids
input files to each boldX
folder:
{
"mag": {
"2": [
"/data/nil-bluearc/GMT/Andrew/experimental_pipeline/test_data/sub-20002/ses-50504/func/sub-20002_ses-50504_task-rest_run-02_echo-1_part-mag_bold.nii.gz",
"/data/nil-bluearc/GMT/Andrew/experimental_pipeline/test_data/sub-20002/ses-50504/func/sub-20002_ses-50504_task-rest_run-02_echo-2_part-mag_bold.nii.gz",
"/data/nil-bluearc/GMT/Andrew/experimental_pipeline/test_data/sub-20002/ses-50504/func/sub-20002_ses-50504_task-rest_run-02_echo-3_part-mag_bold.nii.gz",
"/data/nil-bluearc/GMT/Andrew/experimental_pipeline/test_data/sub-20002/ses-50504/func/sub-20002_ses-50504_task-rest_run-02_echo-4_part-mag_bold.nii.gz",
"/data/nil-bluearc/GMT/Andrew/experimental_pipeline/test_data/sub-20002/ses-50504/func/sub-20002_ses-50504_task-rest_run-02_echo-5_part-mag_bold.nii.gz"
],
...
},
"phase": {
...
}
}
Each list of files is split into "mag"
and "phase"
keys for magnitude and phase data respectively. The inner key (e.g. "2"
) corresponds to the index of the boldX folder (e.g. bold2
). The list of files are the input files for that boldX folder.
It is possible to load a params file to configure the pipeline with the --config
flag.
The functional pipeline also has a regular expression filter with the --regex_filter
flag that allows you to subselect files to process in the dataset.
Some examples:
# only process files with label "task-restME"
run_pipeline functional /path/to/bids --config /blah/blah --regex_filter "task-restME" # <-- the "" are important don't leave them out!
# only process runs 1 and 2
run_pipeline functional /path/to/bids --config /blah/blah --regex_filter "run-0[1-2]" # <-- the "" are important don't leave them out!
When running the pipeline with Apptainer (specifically on high performance clusters), it is recommended to run this pipeline in sandbox mode. This will allow you to emulate docker behavior more accurately.
To do this, first pull the image, then build a sandbox directory with Apptainer's build
command:
# first download the image
apptainer pull docker://vanandrew/me_pipeline:[version]
# now build the sandbox
apptainer build --sandbox me_pipeline_[version] me_pipeline_[version].sif
Once the sandbox has been made, run
it with the following flags:
apptainer run --writable --containall --no-init --no-umask --no-eval --fakeroot --workdir /some/dir/with/lots/of/space -B /your/bind/mount:/mnt me_pipeline_[version] [-h/functional/structural/params]
workdir
should be pointed to a directory on a storage system with lots of space.