This is a base repo for the Schizophrenia Canadian Neuroimaging Database (SCanD) codebase. It is meant to be folked/cloned for every SCanD dataset
General folder structure for the repo (when all is run)
${BASEDIR}
├── code # a clone of this repo
│ └── ...
├── containers # the singularity image are copied or linked to here
│ ├── fmriprep-20.1.1.simg
│ ├── fmriprep_ciftity-v1.3.2-2.3.3.simg
│ └── mriqc-22.0.6.simg simg
├── data
│ ├── local # folder for the "local" dataset
│ │ ├── bids # the defaced BIDS dataset
│ │ ├── mriqc # mriqc derivatives
│ │ ├── fmriprep # fmriprep derivatives
│ │ ├── freesurfer # freesurfer derivative - generated during fmriprep
│ │ ├── ciftify # ciftify derivatives
│ │ ├── cifti_clean # dtseries post confound regression
│ │ └── parcellated # parcellated timeseries
│ |
│ └── share # folder with a smaller subset ready to share
│ ├── ciftify # contains only copied over qc images and logs
│ ├── fmriprep # contains only qc images, metadata and anat tsvs
│ └── parcellated # contains the parcellated data
├── logs # logs from jobs run on cluster
|── README.md
└── templates # an extra folder with pre-downloaded fmriprep templates (see setup section)
└── parcellations
├── README.md
└── tpl-fsLR_res-91k_atlas-GlasserTianS2_dseg.dlabel.nii
Currently this repo is going to be set up for running things on SciNet Niagara cluster - but we can adapt later to create local set-ups behind hospital firewalls if needed.
- Organize your data into BIDS..
- Deface the BIDS data (if not done during step 1)
- Setting your SciNet enviroment/code/and data
- Clone the Repo
- Run the software set-up script (takes a few seconds)
- Copy or link your bids data to this folder
- Run MRIQC
- Run fmriprep
- Run ciftify
- Run ciftify_clean and parcellate
- Run the scripts to extract sharable data into the sharable folder
This is the longest - most human intensive - step. But it will make everything else possible! BIDS is really a naming convention for your MRI data that will make it easier for other people the consortium (as well as the software) to understand what your data is (what scan types, how many participants, how many sessions..ect). Converting to BIDS may require renaming and/or reorganizing your current data. No coding is required, but there now a lot of different software projects out there to help out with the process.
For amazing tools and tutorials for learning how to BIDS convert your data, check out the BIDS starter kit.
A useful tool is this BIDSonym BIDS app.
cd $SCRATCH
git clone https://github.com/TIGRLab/SCanD_project.git
cd ${SCRATCH}/SCanD_project
source code/00_setup_data_directories.sh
We want to put your data into:
./data/local/bids
You can do this by either copying "scp -r", linking ln -s
or moving the data to this place - it's your choice.
To copy the data from another computer/server you should be on the datamover node:
ssh <cc_username>@niagara.scinet.utoronto.ca
ssh nia-dm1
rsync -av <local_server>@<local_server_address>:/<local>/<server>/<path>/<bids> ${SCRATCH}/SCanD_project/data/local/
To link existing data from another location on SciNet Niagara to this folder:
ln -s /your/data/on/scinet/bids ${SCRATCH}/SCanD_project/data/local/bids
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
## calculate the length of the array-job given
SUB_SIZE=10
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/01_mriqc.sh
Note: this step uses and estimated 16hrs for processing time per participant! So if all participants run at once (in our parallel cluster) it will still take a day to run.
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
# module load singularity/3.8.0 - singularity already on most nodes
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
## calculate the length of the array-job given
SUB_SIZE=5
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} code/01_fmriprep_anat_scinet.sh
Running the functional step looks pretty similar to running the anat step. The time taken and resources needed will depend on how many functional tasks exists in the experiment - fMRIprep will try to run these in paralell if resources are available to do that.
Note - the script enclosed uses some interesting extra opions:
- it defaults to running all the fmri tasks - the
--task-id
flag can be used to filter from there - it is running
synthetic distortion
correction by default - instead of trying to work with the datasets available feildmaps - because feildmaps correction can go wrong - but this does require that the phase encoding direction is specificed in the json files (for example"PhaseEncodingDirection": "j-"
).
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## figuring out appropriate array-job size
SUB_SIZE=2 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/02_fmriprep_func_scinet.sh
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## figuring out appropriate array-job size
SUB_SIZE=8 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_SUBJECTS=$(( $( wc -l ./data/local/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
cd ${SCRATCH}/SCanD_project
sbatch --array=0-${array_job_length} ./code/03_ciftify_scinet.sh
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## figuring out appropriate array-job size
SUB_SIZE=10 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_DTSERIES=$(ls -1d ./data/local/ciftify/sub*/MNINonLinear/Results/*task*/*dtseries.nii | wc -l)
array_job_length=$(echo "$N_DTSERIES/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/04_cifti_clean.sh
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## figuring out appropriate array-job size
SUB_SIZE=10 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_DTSERIES=$(ls -1d ./data/local/ciftify/sub*/MNINonLinear/Results/*task*/*dtseries.nii | wc -l)
array_job_length=$(echo "$N_DTSERIES/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ./code/05_parcellate.sh
This step does calls some "group" level bids apps to build summary sheets and html index pages. It also moves a meta data, qc pages and a smaller subset of summary results into the data/share folder.
It takes about 10 minutes to run (depending on how much data you are synching). It could also be submitted.
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/06_extract_to_share.sh
To get an openneuro dataset for testing - we will use datalad
## loading Erin's datalad environment on the SciNet system
module load git-annex/8.20200618 # git annex is needed by datalad
module use /project/a/arisvoin/edickie/modules #this let's you read modules from Erin's folder
module load datalad/0.15.5 # this is the datalad module in Erin's folder
cd ${SCRATCH}/SCanD_project/data/local/
datalad clone https://github.com/OpenNeuroDatasets/ds000115.git bids
Before running fmriprep anat get need to download/"get" the anat derivatives
cd bids
datalad get sub*/anat/*T1w.nii.gz
Before running fmriprep func - we need to download the fmri scans
cd bids
datalad get sub*/func/*
But - with this dataset - there is also the issue that this dataset is old enough that no Phase Encoding Direction was given for the fMRI scans - we really want at least to have this so we can run Synth Distortion Correction. So we are going to guess it..
To guess - we add this line into the middle of the top level json ().
"PhaseEncodingDirection": "j-",
note: now - thanks to the people at repronim - we can also add the repronim derivatives !
cd ${SCRATCH}/SCanD_project/data/local/ls
datalad clone https://github.com/OpenNeuroDerivatives/ds000115-fmriprep.git fmriprep
datalad clone https://github.com/OpenNeuroDerivatives/ds000115-mriqc.git mriqc
getting the data files we actually use for downstream ciftify things