This pipeline was put together by Daniele Tavernari and Marco Varrone in December 2021 to streamline the analysis of multicolor immunofluorescence (IF) images of lymphoma and lymphomoid samples. It involves both manual and automated steps using open source tools and custom scripts that we developed. The pipeline is not optimal and can be improved in some of its parts. Feel free to adapt / modify / further develop the pipeline itself or the scripts involved, and reach out to us if you want to discuss.
The pipeline scripts must be downloaded on the server. So it strongly suggested to create one directory for the pipeline and one directory for managing the data during the executions of the scripts. We will refer to the latter as the main directory.
Then to work with QuPath on your computer, you can install QuPath on your local machine and mount both the code directory and the main directory on your local machine. Be aware that huge images require a good internet connection to be loaded and inspected without lags. A good amount of RAM is also advisable.
If you are using sshfs to mount the folders on a Mac, add the -o defer_permissions
parameter to avoid problems of permission denied when saving files.
A good starting point is to load and visualize your IF image on QuPath. You can download QuPath and browse its documentation here.
QuPath is organized into “projects”. When you create a new project, you will be asked to allocate an empty folder to it. Then, you can add images to the project. It is fast to switch between images of the same project, and you will see the thumbnails of all of them at the same time; conversely, to inspect the images of a different project you will have to close the current one and open the other. This can affect the way you want to organize your project(s): I recommend to put related images into the same project (e.g. all images of the same patient, or of the same treatment). It is in principle also possible to store all images of all patients in the same project, but it probably would become messy.
When loading a .vsi raw image file, sometimes the same file will contain more than one image (different acquisitions of the same sample, or even different samples). Discard those that are unusable (e.g. out of focus, or with too little tissue and only fatty holes). Instead, for the good images, change their QuPath image name into a meaningful one (sometimes the name of the raw file is just e.g. “Image_10.vsi” - it is strongly suggested, but not required, to rename them as “HLS23_s12_acq01”, which mentions the Human Lymphoma Sample number, the paraffin section, and the acquisition ID of that section) as a clear and consistent way.
Now you can explore the loaded image. If needed, set its type into "Fluorescence". You can change brightness, contrast and show/hide the different channels, corresponding to the different stained proteins. Sometimes it is useful to look at a single channel in grayscale mode.
In this step, you will draw and save the coordinates of the boundaries of all lymphomoids present in the image, so that they will be processed separately in the downstream analyses, and pieces of tissue scattered around the gel will be discarded.
- For each lymphomoid, draw a closed boundary with QuPath polygon tool. Boundaries should be drawn around tumor regions in the same way as you did for lymphomoids.
- Open getBoundary.groovy script in QuPath script editor (Automate -> Show script editor)
- In the script, set the absolute path to your desired main directory for all pipeline input/output files (MainDir) and the image name (ImageName)
- For each boundary:
- Select the boundary itself (double click - a selected boundary is displayed in yellow)
- Set the patient-specific lymphomoid name in the script (PatientLymphomoidName), avoid using underscores. This name should include the patient ID and the lymphomoid ID (e.g. "LP02ctrl01"). Be careful in this passage: if the same lymphomoid appears in different images, they should all have the same PatientLymphomoidName. In this way, different acquisitions (encoded with different ImageName's) will be treated as replicated measurements of the same lymphomoid in the downstream analyses
- Run the script
The script will also setup the directory Quantification inside the main directory that is required for Step 4.
In this step, you will tune the thresholds for each channel to classify a cell as "positive" or not for each of the proteins. To do so, a script will automatically detect nuclei and classify cells according to the thresholds that you give it as input (for DAPI and Ki67, cells will be classified according to the fluorescence levels inside the nuclei; for all other markers, the fluorescence levels will be thresholded in the cytoplasmic regions). The script also takes as input a configuration table that stores the association between channels, antibodies and cellular location (e.g. FITC - B220 - Cytoplasm); the two provided configuration tables human_channels.txt and mouse_channels.txt should work in the majority of the cases, but check that they are correct before running the script and create new ones with the same structure in case it is needed. Then, tune the threshold of each channel until you are satisfied with the classification. Finally, the script stores all the tuned thresholds in a text file, that will be used in the downstream analyses. Here are the detailed (sub)steps:
- Draw a rectangular region that seems to contain cells positive for each of the channels, and select it (you can also use an existing lymphomoid boundary)
- Open calibrateThresholds.groovy script in QuPath script editor
- In the script, set the absolute path to your desired main directory (MainDir) and configuration table (ConfigTable), and the image name (ImageName) in the Input section. These have to be the same as for the boundary drawing (Step 2, point 3).
- Tune the DAPI threshold first:
- In the 'Brigthness & contrast' menu, turn off all channels except for DAPI, and set it to grayscale
- Hover the cursor on the nuclei to have an idea on what might be the threshold. You can see the pixel value on the bottom-right corner of the QuPath image.
- Assign a reasonable threshold guess to the DAPI_thresh variable in the Input section of the script
- Run the script and inspect the results visually. You can view/hide the detections in QuPath with View->Show detections (Keyboard shortcut: D)
- If nuclei have been under- (or over-) called, change the DAPI threshold and re-run the script until you are satisfied with the results
- Tune the FITC threshold:
- In the 'Brigthness & contrast' menu, show the FITC channel
- Hover the cursor on the FITC stained cytoplasms to have an idea on what might be the threshold
- Assign a reasonable threshold guess to the FITC_thresh variable in the Input section of the script
- In the Calibration section of the script, uncomment the lines related to the FITC classification (setCellIntensityClassifications(channel_celllocation_map['FITC']+(...)) by removing the groovy comment symbol ('//')
- Run the script and inspect the results visually. Cells positive for FITC will appear in red, negative cells will be blue.
- Tune the FITC threshold and re-run the script until you are satisfied with the results
- Once you are done, comment out again the lines related to the FITC classification. Don't ever comment out the line related to the DAPI nuclei detection.
- Repeat step 5 for all the other channels. Remember that at every tuning round only the lines related to (1) the DAPI detection and (2) the intensity classification of your channel of interest should be uncommented in the Calibration section
- If you wish, repeat all the steps from 1 to 6 drawing a different rectangular region, to test whether your thresholds would stay the same or not
- At each run, the script is saving and overwriting the output file with all the thresholds. Thus, your last run should be done with all the thresholds already tuned, as the final output file will be saved with those. If you want to run the script without saving the file, comment out the Saving section
This step, except the part of format conversion, is fairly automatized, so it is suggested to perform Step 1-3 on multiple images, so that Step 4 can be left running on all of them, without running them individually.
DeepCell is a nuclei and cell segmentation software that is more robust to different levels of marker intensity and, thus, gives better results when the intensity of DAPI varies dramatically in the same sample.
The script run in Step 2 should have already created, inside the main directory, a directory called Quantification. Quantification contains other directories with the ImageName for which boundaries have been drawn and finally, each of this directory should contain an empty directory called registration.
To summarize, you should find a structure like the following:
Quantification
├── HLS07_s01_acq01
│ ├── registration
├── HLS25_s41_acq01
│ ├── registration
├── HLS25_s41_acq03
│ ├── registration
...
In this step we will convert the images from the .vsi to the .ome.tif format. The script uses a software called bfconvert from Bio-Formats Tools, and requires the packages listed in the file requirements.txt. In order to install them, make sure that you are using Python 3 (version 3.8 or newer), and if you are using a conda environment, run conda install -c conda-forge pip openjdk
. Then, run pip install numpy
, then pip install -r requirements.txt
and finally pip install numpy --upgrade
.
-
If it doesn't already exist, create a directory called Tiff inside the main directory. So it should be at the same level of the Quantification directory.
-
Connect to the server using ssh on your Terminal and move to the directory containing the scripts of the pipeline.
-
Run the script vsi2tiff.py with the following required parameters:
--input_vsi
: the path to the .vsi file that needs to be converted into .ome.tif files.--output_dir
: the path to the Tiff directory
Some other parameters may be required in case of non-standard uses of the pipeline, so they can usually be ignored
--bftools_dir
: the path to the bftools director. The default is/mnt/data2/shared/Lymphomoid-IF-software/bftools
where it's already present.
-
Note that a single vsi will generate a separate .ome.tif file for each image acquisition. The script will create multiple .ome.tif files concatenating the name of the parent directory of the .vsi file, the name of the .vsi file and the acquisition number.
-
Delete the acquisitions that have been discarded in Step 1. You can identify those acquisitions because there is no equivalent directory inside Quantification.
-
Rename with ImageName.ome.tif and move each image into the registration folder of the corresponding directory. So, every registration folder should contain one and only one .ome.tif file, as in the following diagram:
Quantification ├── HLS07_s01_acq01 │ ├── registration │ │ HLS07_s01_acq01.ome.tif ├── HLS25_s41_acq01 │ ├── registration │ │ HLS25_s41_acq01.ome.tif ├── HLS25_s41_acq03 │ ├── registration │ │ HLS25_s41_acq03.ome.tif ...
In case you have doubts on to which .vsi file a .ome.tif file corresponds too, you can import the .ome.tif file in QuPath and compare it.
The cell detection script require few Python packages for basic file processing. They are all already installed as a Python virtual environment in /mnt/data2/shared/Lymphomoid-IF-software/Lymphomoid-IF-venv/.
If you want to download and install them somewhere else, the required packages are listed in the file requirements.txt.
-
To be able to access the packages you need to activate the environment using:
source /mnt/data2/shared/Lymphomoid-IF-software/Lymphomoid-IF-venv/bin/activate
or activate your environment if you are using your own personal one.
-
Then you need to run the cellDetection.py script, with the following required parameters:
--sample_names
: a list of ImageName of the images to process, it could be 1 or many, separated by a whitespace (see example later).--dir
: the path of the Quantification directory--channel_info_path
: the path to the .txt or .tsv file containing information on the image channels. The values must be separated by tabs. In particular, the file must contain a Channel_name and a Cellular_location (Nucleus or Cytoplasm) column.
Some other parameters may be required in case of non-standard uses of the pipeline, so they can usually be ignored
--deepcell_path
: the absolute path to the Singularity image of DeepCell. The default is/mnt/data2/shared/Lymphomoid-IF-software/deepcell.sif
where it's already present.--nucleus_channel
: the name of the channel associated to the nuclear marker (e.g. DAPI). The default isDAPI
.
An example of the script calling is: python3 cellDetection.py --dir /mnt/data2/varrone/elisa_lymphomoids/Quantification/ --sample_names HLS25_7 HLS25_41acq01 HLS25_41acq03 --channel_info_path /mnt/data2/varrone/elisa_lymphomoids/mouse_channels.txt
A lot of warning messages will appear, but they are normal. As long as the message Channels extracted successfully.
appears, the software will have worked successfully.
The run may take a while for each image (tens of minutes). For this reason, it is suggested to pass many images at the same time through the --sample_names
parameters and run the script overnight.
This step obtains for each cell, from its mask detected in the cell detection step, the mean intensities of each of the markers in the nucleus and in the cytoplasm.
-
To run this step, check first that you are part of the Docker group on the server. If your username is in the output of
grep /etc/group -e "docker"
, you can move to the next point, otherwise ask the system admin (currently Luca Nanni) to add you to the Docker group. -
If the virtual environment has not been activated from the cell detection part, run the
source /mnt/data2/shared/Lymphomoid-IF-software/Lymphomoid-IF-venv/bin/activate
command to activate the virtual environment. -
Then, run the quantifyIntensities.py script, with the same required parameters as in the cell detection step:
--sample_names
: a list of ImageName of the images to process, it could be 1 or many, separated by a whitespace (see example later).--dir
: the absolute path of the Quantification directory--channel_info_path
: the path to the .txt or .tsv file containing information on the image channels. The values must be separated by tabs. In particular, the file must contain a Channel_name and a Cellular_location (Nucleus or Cytoplasm) column.
Parameters for non-standard uses of the pipeline
--nextflow_dir
: path to the Nextflow software. The default is/mnt/data2/shared/Lymphomoid-IF-software/nextflow
where it's already present.
An example of the script calling is: python3 quantifyIntensities.py --dir /mnt/data2/varrone/elisa_lymphomoids/Quantification/ --sample_names HLS25_7 HLS25_41acq01 HLS25_41acq03 --channel_info_path /mnt/data2/varrone/elisa_lymphomoids/mouse_channels.txt
As previously mentioned, the directory containing all the necessary software is already present in the uporicchiosrv1 server.
If you want to download or update the four software required:
- bftools: bftools can be downloaded from the Bio-Formats website. Current version: 6.8.1.
- Virtual Environment: the Python packages required for running the pipeline are listed in the file requirements.txt. If you want to install the packages in your current virtual environment, if you are using conda, make sure to run
conda install pip
first. Then you can runpip install -r requirements.txt
. - DeepCell: the image for Singularity can be downloaded running
singularity pull deepcell.sif docker://vanvalenlab/deepcell-applications:latest
. Current version: 0.3.1. - Nextflow: select the directory where ypu want to download nextflow and run
curl -s https://get.nextflow.io | bash
. For more information visit the website. Current version: 21.10.6.5660. - MCMICRO: you can get the latest version of MCMICRO by moving to the directory where nextflow is (e.g.
/mnt/data2/shared/Lymphomoid-IF-software/nextflow
) and running./nextflow pull labsyspharm/mcmicro
. For more information visit the website. Current version: Github revision 46abd97bc0.
In this step we will take quantification performed at step 4 and classify the cells according to the thresholds we calibrated at step 3. Each detected cell has a value for each of the cytoplasmic markers; if for a given cell they are all below the thresholds, that cell will be assigned to "otherCell", otherwise it will be assigned to the marker with the highest difference between the cytoplasmic intensity of that marker and its threshold. This means that in the current implementation a cell can be assigned to (or 'positive for') at most one marker. Additionally, cells will be classified as proliferating or not according to whether the Ki67 intensity is above or below its calibrated threshold. The script ClassifyCells_Analyses.R performs this cell classification, downstream analyses and plotting for all lymphomoids provided as input. In detail:
- Open ClassifyCells_Analyses.R with a text editor or RStudio
- Set up the input:
- MainDir: your main directory, same as for the previous steps
- ConfigTable: your configuration table, same as for the previous steps
- Lymphomoids_to_process: set to "all" if you want all previously drawn lymphomoids to be processed, otherwise specify a subset of them as a character vector
- Optionally, you can change some of the plotting parameters
- If they are not already installed, install R packages
sp
,ggplot2
andreshape2
- Run the script
The script will generate the following output under MainDir (all folders are created automatically):
Classified_cells_tables/Table_<ImageName>_<PatientLymphomoidName>_AllCells.txt
containing, for each cell (rows), information on marker intensities, centroid coordinates in micrometers, assigned marker/antibody and wheter the cell is proliferatingDigital_IF_images/
IFimage_<ImageName>_<PatientLymphomoidName>_all.pdf
digitalized image with cells rendered as simple dots, color-coded by their assigned marker, and a red contour for the lymphomoid boundaryIFimage_<ImageName>_<PatientLymphomoidName>_NoOtherCells_OnlyInLymphomoid.pdf
same but excluding otherCells and showing only cells inside the lymphomoidIFimage_<ImageName>_<PatientLymphomoidName>_NoOtherCells_OnlyInLymphomoid_OnlyProliferating.pdf
same but showing only proliferating cells inside the lymphomoid
*LymphomoidLevel*
files contain results summarized at the lymphomoid level, i.e. if two images contained the same lymphomoid (having the same name), their cell counts were averagedSummaryTable_LymphomoidLevel_AllCells.RData/.txt
table containing for each lymphomoid (rows), total cell counts for each marker, total counts of proliferating cells for each marker, and total number of cells detected in the lymphomoidLymphomoidLevel_CellTypeProportions_StackedBarplot.pdf/.txt
plot and table of cell type proportions across lymphomoidsLymphomoidLevel_CellTypeProportions_ExclOtherCells_StackedBarplot.pdf/.txt
same as above but excluding otherCellsLymphomoidLevel_ProlifVsNot_<marker>.pdf/.txt
plot and table of proportion of proliferating/not proliferating cells of a given marker across lymphomoids
Results_ImageXLymphomoid_level/*
same output of the point above, but without lymphomoid-level summarization (i.e. 'image X lymphomoid' level). It could be useful to check that two images of the same lymphomoid (before summarization) have comparable cell proportionslog_files/*
.log files with date and time of the script runs in their name and containing the various input file paths for reproducibility
Here we take the tables with classified cells under Classified_cells_tables/
and investigate the neighbourhood of a given cell type, computed with Delaunay triangulation. The Python script NeighbourhoodAnalyses.py considers a query cell type (e.g. B cells) and determines, for each lymphomoid (1) the neighbourhood composition in terms of relative frequency of each cell type in its nearest neighbours, and (2) the density distributions of physical distances (in µm) between query cells and each cell type in its neighbourhood.
The input of NeighbourhoodAnalyses.py is the following:
--main_dir
: your main directory, same as for the previous steps--lymphomoids_to_process
: set to all if you want all lymphomoids saved underClassified_cells_tables/
to be processed, otherwise specify a subset of them as a character list (default: all)--celltype_query
: the cell type of which you want to compute the analyse the neighbourhood (default: Bcells)--celltype_query_proliferation_status
: whether to consider any of the query cells, only proliferating or only not_proliferating (it solely accepts these three values, default: any)--consider_othercells
: whether to consider unclassified otherCells in the neighbourhood composition (default: False)--plot_delaunay
: whether to plot the spatial Delaunay networks (default: True)--custom_file_prefix
: optional prefix to add to the output files
An example of script call is:
python NeighbourhoodAnalyses.py --main_dir /mnt/ndata/daniele/elisa_lymphomoids/Processed/Pipeline_test/ --lymphomoids_to_process all --celltype_query Bcells --celltype_query_proliferation_status any --consider_othercells False --plot_delaunay True --custom_file_prefix Test24Apr_
Output files (plots and tables with query cell-level neighbourhood composition and distances) are saved in the newly created Neighbourhood_analyses/
under MainDir.
Nearest neighbours that are farther than the 95th percentile of physical distances (i.e. relatively far neighbours) are discarded. Results are reported both at the 'lymphomoid'-level (summarizing results of multiple acquisitions of the same lymphomoid) and at the 'image X lymphomoid'-level.
Since all tables are saved, you can customize your plots/perfom additional downstream analyses by reading them back in your favourite programming language.