Skip to content

2a) Running the Clustering Pipeline to Create a Study specific Groupwise Data Driven Parcellation Atlas

Fan Zhang edited this page Apr 27, 2018 · 1 revision

Running the Clustering Pipeline to Create a Study-specific Groupwise Data Driven Parcellation (Atlas)

The commands for running the clustering pipeline are listed below in bold text with a brief explanation of their use. To get more information on any command, run it with the --help flag as follows:

wm_NAME_OF_SCRIPT.py --help

The output file(s) will be created in the output directory you specify. The output directory you specify will be created for you if it does not exist.

Initial data quality control

This step is important to check that all subjects' tractography data has been created the same way (same data stored), that the gradient directions in the input DWI files that produced the tractography were okay (visual inspection of tract anatomy to verify correct appearance), and that in general the tractography dataset is ready to proceed to the next step. In addition, this command is useful to visually and quantitatively inspect any directory of tractography, such as the output from creating a cluster atlas.

wm_quality_control_tractography.py

  • First step in the pipeline to check tractography files for errors.
  • Outputs rendered images of each subject (each tractography file in input directory) and information about fiber length distributions and data fields stored along the tracts.
  • Input fiber tracts must be in vtkPolyData format, in either .vtk or .vtp files.
  • Example command to perform quality control on all files in the input_tractography directory:
wm_quality_control_tractography.py input_tractography/ qc_output/

Commands to create a cluster atlas

The following steps create a model of common white matter structures in a population. Registration and clustering use random fiber samples to enable analysis of relatively large datasets. The final output is a fiber cluster atlas. This atlas model can then be applied to cluster all fibers from individual datasets for further quantitative and/or visual analyses.

wm_register_multisubject_faster.py

  • Runs a multisubject unbiased group registration of tractography.
  • This is used before clustering the atlas and can be used on all data (for a large multi-subject atlas), on selected control data, or on representative data from the population (e.g. controls and/or patient datasets evenly distributed over the age range or selected using other criteria for the study).
  • This command can run affine or nonrigid (b-spline) registration. (Always run affine first. Running nonrigid afterwards with the affine output as input will often produce the best results but is not essential.)
  • Command options include -l (minimum length) option and the -f (fibers per subject).
  • The number of processors used should be less than or equal to the number of subjects, because at each iteration the method multiprocesses over subjects. For 5 subjects it's good to use 5 processors, while for 100 subjects it's good to use a number of processors that can evenly divide the 100 subjects such as 10 or 20.
  • Example command to use 12 processors (this is a good number if there are 12 subjects being registered):
wm_register_multisubject_faster.py -l 20 -j 12 -f 20000 
-midsag_symmetric -mode affine input_tractography/ registered_tractography/

wm_cluster_atlas.py

  • Runs clustering of tractography for multiple subjects to create an atlas. Clusters can be viewed in Slicer following this step.
  • This step analyzes a subset of fibers from each subject to determine common structures in the population.
  • The parameters used will affect the overall run time and depend on the fiber length distribution, number of fibers per subject, and total number of fibers in the whole population. You can find this information using the quality control script.
  • With multi-fiber tractography data, the number of clusters can range from 400 to over 1000, depending on how finely the white matter is to be divided. Try for example 400 and 650, and inspect the output to see which clustering is better for your analysis purposes.
  • The number of fibers per subject is an important parameter. Try for a total of at least 100,000-200,000 fibers total across subjects, or more if outlier removal iterations are to be used. (So for example, with 10 subjects, if each subject has 10,000 fibers sampled as input, then there is a total of 100,000 fibers. If 500 clusters are requested, then on average each subject can contribute 20 fibers to each cluster.)
  • There are multiple iterations to repeat the clustering and outlier removal process. The default is three iterations. Visual inspection of the atlases produced after 0, 1, 2, etc. iterations of outlier removal should help choose which iteration works well for the particular dataset.
  • Check the output pdf: subjects_per_cluster_hist.pdf. This is the histogram of how many subjects are present in each cluster. If the parameters have been set reasonably, most clusters will contain fibers from all subjects.
  • We recommend visualizing the output clusters using the tractography quality control command wm_quality_control_tractography.py.
  • Example clustering command, where length 40-80mm is reasonable for UKF two-tensor tractography in adults (try 40mm or lower for single tensor tracts, which are shorter overall).
wm_cluster_atlas.py -l 40 -f 10000 
-nystrom_sample 2500 -k 400 -j 6 
registered_tractography/output_tractography/ atlas_output/

Commands to do fiber clustering result quality control

wm_quality_control_tractography.py

  • This is the same quality control script as in the initial data quality control.
  • Run this script to view all atlas clusters.

Please see instructions for data analysis and visualization here: https://github.com/ljod/whitematteranalysis/wiki/Visualization-and-Analysis-of-Clustered-Tracts

Help

  • Test input data to try running the commands can be found within the test directory:

whitematteranalysis/test/test_data

  • The source code of the clustering commands can be found in the bin directory of whitematteranalysis:

whitematteranalysis/bin/

Thank you to Julie Marie Stamm for her help writing these instructions.