Cellos: High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology
Cellos (Cell and Organoid Segmentation) is a pipeline developed to perform high-throughput volumetric 3D segmentation and morphological quantification of organoids and their cells. Cellos segments organoids using classical algorithms and segments nuclei using our trained model based on Stardist-3D (https://github.com/stardist/stardist).
The image data used here were exported from the PerkinElmer Opera Phenix high content screening confocal microscope. The resulting folder contains subfolders with tiff files (Images) and xml files (metadata). Each tiff file was a single image from one well, one field, one plane and one channel. We developed an automatic protocol that organized all tiff files from same well and saved them as zarr arrays to minimize RAM and storage. All information for the images are deconvoluted from the respective metadata files.
We provide two ways to install the pipeline:
- Install into a Python 3.7 environment using
conda
orpip
- Build and use a Apptainer/Singularity container
Currently, the pipeline uses a Python 3.7 environment. We provide a defined requirements.txt
to install all packages and dependencies for a working environment on Rocky 9 Linux.
We recommend creating a virtual environment for running the pipeline, for example using conda
.
Installing the pipeline using conda
to manage the Python version:
git clone https://github.com/TheJacksonLaboratory/Cellos.git
cd Cellos #(make sure you are in the correct directory)
conda env create -f environment.yml
This will use conda
to create a Python 3.7 environment and then install all packages from PyPI using pip
and the requirements.txt
file.
If you prefer to install the pipeline dependencies into a pre-existing Python 3.7 environment (e.g. venv
), you can use:
pip install --require-hashes --no-deps -r requirements.txt
This will ensure you install the exact packages that we've tested.
Note
- At present we've tested the pipeline only on Centos 7 and Rocky 9 Linux and using Python 3.7.
- The provided environment does not include additional packages required for specific GPU support, e.g. CUDA.
To build an Apptainer/Singularity container, you can use the provided Cellos.def
file (either clone the repository or wget/curl the .def
file):
apptainer build Cellos.sif Cellos.def
Note
As a convenience, we also provide a Dockerfile
, however this has not been extensively tested and is not supported in our computing environment, so we cannot offer any support or universal advice on using it.
To build the Docker container, you can use the provided Dockerfile
after cloning the repository:
docker build -t Cellos .
Tip
At JAX, the easiest way to build containers from the definitions in this repository is to use the build
partition on Sumner2.
For more details regarding accessing and using this JAX-specific resource, please see the instructions in SharePoint.
You can access it from a login node by using:
sinteractive -p build -q build
Tip
singularity
and apptainer
can be used interchangeably in all of these commands.
Load the needed singularity/apptainer module:
module load singularity
To build the container either clone the repository:
git clone https://github.com/TheJacksonLaboratory/Cellos.git
or download the container definition:
wget https://github.com/TheJacksonLaboratory/Cellos/raw/refs/heads/master/Cellos.def
Either way, using the build partition, ensure you are in the directory with the definition Cellos.def
file and then you can build the container using singularity build
:
singularity build Cellos.sif Cellos.def
Note
This will take a few minutes! It will download an image, install packages, build the python environment, and then write the resultant .sif
file.
Once you see INFO: Build complete: Cellos.sif
you can end the session using exit
.
Tip
You can now add the directory enclosing the container to your PATH
variable to ensure you can use it from other (sub)directories. Ensure you are in the same directory as the Cellos.sif
file and run:
export PATH=$PATH:$(pwd)
Alternatively, you can copy/build the Cellos.sif
container to a directory already in your PATH
, e.g. a /bin
directory in your home directory.
There are two main steps to run the pipeline:
- Organizing images and organoids segmentation.
- Nuclei segmentation
Each of these can be run on an individual well using a plain bash
script or as an sbatch
script. To run on a whole plate, the script uses sbatch
to launch jobs on a SLURM HPC cluster. The sbatch
settings have been optimized using the sample data set and the JAX Sumner2 cluster.
Important
If you are running this pipeline on Sumner2, be aware that the scheduler is merciless and will kill your job if it exceeds the requested memory.
The two sbatch
scripts, scripts/process_organoids/stitch_well.sh
and scripts/process_cells/cells_seg_well.sh
, have ~25% memory headroom, based on the sample data, but if your jobs are killed you will want to edit them to increase the requested memory.
Important
If you are using a virtual environment, ensure you have it activated!
For example, using conda
as recommended, do:
conda activate organoid
Otherwise, provide the path to your Python 3.7 interpreter in the PYTHONPATH
variable.
If you are using the Apptainer/Singularity container for the Python interpreter, then set the PYTHONPATH
to the path of the build container cellos.sif
. If followed the above instructions and added the container directory to your PATH
you can use: PYTHONPATH=Cellos.sif
in the commands below.
You may also need to ensure the scripts are executable using:
chmod u+x <script name>
-
For a single well--this takes ~2 hours wall-time and uses ~128G of memory.
From an interactive session, usingbash
:cd scripts/process_organoids/ PYTHONPATH=$(which python) bash stitch_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
As a SLURM job using
sbatch
(requests: 2 cores, 160G memory):cd scripts/process_organoids/ PYTHONPATH=$(which python) sbatch stitch_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
-
For a whole plate--this submits a series of the above as SLURM jobs using
sbatch
:cd scripts/process_organoids/ PYTHONPATH=$(which python) bash process_plate.sh -f ../../config.example.cfg
-
For a single well--this takes <20 min wall-time with 8 cores and uses ~6G of memory.
From an interactive session, usingbash
:cd scripts/process_cells/ PYTHONPATH=$(which python) bash cells_seg_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
As a SLURM job using
sbatch
(requests: 8 cores, 10G of memory):cd scripts/process_cells/ PYTHONPATH=$(which python) sbatch cells_seg_well.sh -r <row number> -c <column number> -f ../../config.example.cfg
For a whole plate--this submits a series of the above as SLURM jobs using
sbatch
:PYTHONPATH=$(which python) bash cells_process_plate.sh -f ../../config.example.cfg
Note
All of the above commands are using ../../config.example.cfg
as the location of the config file, because of the layout of this repository. You can provide an absolute path to another location.
The pipeline requires certain key parameters to be provided. For this we use a simple INI style plain-text file that can be parsed with the configparser
module.
In the repository we provide an example configuration file, config.example.cfg
.
Parameter | Description |
---|---|
[pipeline] |
|
plate_path | path to where your raw images are |
output_path | path to where the csv files and zarr arrays will be saved |
well_targets | name number of rows and columns (row1,col1|row2,col2) of wells to analyze |
[stitch_well] |
|
plane_size | size of image of one field, one z-slice and one channel |
overlap_x | overlapping pixels between two adjacent fields |
overlap_y | overlapping pixels between two adjacent fields |
[cells_seg_well] |
|
output_path | path to where the csv files will be saved |
stardist_path | path to the trained model for nuclei segmentation |
Note
The paths can be relative to the scripts, as is in the example provided here, which assumes the layout of the repository is fixed. Otherwise, the paths should be provided as absolute paths.
We have made an example dataset with one well data publicly available. The well row number=3 and column number=7. The image has 3 channels, channel1=EGFP, channel2=mCherry and channel3=brightfield.
It can be downloaded from: https://figshare.com/articles/dataset/cellos_data_zip/21992234
On Linux, you can download it as follows:
wget https://figshare.com/ndownloader/files/39032216
Warning
This is a ~11Gb zip file.
On Linux, it needs to be unziped using 7z:
7z x 39032216
This will extract a cellos_data
folder consisting of images (.tiff
) and Index.idx.xml
(metadata) file.
To use the provided config.example.cfg
and script commands from above, we recommend you place the cellos_data
in the root of this repository.
You should obtain the following layout for the Cellos
directory, where ...
indicates abridged files:
├── config.example.cfg
├ ...
├── cellos_data
│ └── Index.idx.xml
| └── r03c07 ... .tiff
├── models
│ └── stardist
│ ├ ...
├── output
│ ├ ...
└── scripts
├── process_cells
│ ├── cells_process_plate.sh
│ ├── cells_seg_well.py
│ └── cells_seg_well.sh
└── process_organoids
├── process_plate.sh
├── stitch_well.py
└── stitch_well.sh
Important
We provide the expected results for running the pipeline on the sample data in the output
folder in the root of the repository.
If you plan on running the pipeline on the sample data, we recommend you backup or rename this folder such that you can compare your results with ours.
Alternately, you can change the output_path
paramters in the .cfg
file.
Assuming the above layout, you can use the provided config.example.cfg
and run the pipeline in two steps:
Important
- If you are using an interactive session, ensure you have enough memory!
- Ensure you have activated your virtual environment, e.g:
conda activate organoid
-
Organize images and segment organoids (this takes ~2 hours using the
sbatch
script)
From the Cellos directory (root of the repository)cd
into the proper scripts directory:cd scripts/process_organoids
Run the first step using
bash
(an interactive session):PYTHONPATH=$(which python) bash stitch_well.sh -r 3 -c 7 -f ../../config.example.cfg
Alternately, run the first step as a SLURM job using
sbatch
(requests: 2 cores, 160G memory):PYTHONPATH=$(which python) sbatch stitch_well.sh -r 3 -c 7 -f ../../config.example.cfg
-
Segment cells (this takes <20 min using the
sbatch
script)
From the Cellos directory (root of the repository)cd
into the proper scripts directory:cd scripts/process_cells
Run the second step using
bash
(an interactive session):PYTHONPATH=$(which python) bash cells_seg_well.sh -r 3 -c 7 -f ../../config.example.cfg
Alternately, run the second step as a SLURM job using
sbatch
(requests: 8 cores, 10G of memory):PYTHONPATH=$(which python) sbatch cells_seg_well.sh -r 3 -c 7 -f ../../config.example.cfg