The guidances notes provide an overview of the folder structure used in this handbook
- All projects live in an S3 bucket
- The directory structure is always
<bucket-name>/projects/<project_name>
- In the
<project_name>
folder there are sub directories- The first subdirectory is
workspace
- Other subdirectories are batches of data
- The first subdirectory is
- The batches of data are labeled by date and include
images
andillum
folders- In the
images
folder there exist different plates storing raw image data - The
illum
folder is identical to theimages
folder in terms of structureillum
is an output of the first stage of CellProfiler pipeline that stores a function to adjust the plates inimages
- In the
workspace
also has subdirectoriesanalysis
- includes subfolders mirroring theBatch
nesting- Within each
batch
folder, the CellProfiler results are stored inplate_id
- Within each
plate folder
there is ananalysis
folder- Inside this
analysis
folder, each well has its own folder (e.g.A01-1
)A
and01
refer to the row and column of the plate, 1 refer to sites per well- If the grouping was done by well instead of by
site, this would be
A01
, without the suffix of-1
- Note that this
analysis
folder is customizable- There are typically 384 (# of wells) x 9 (# of
sites per well) subfolders
- 384 well plate
- 9 different pictures
- There are typically 384 (# of wells) x 9 (# of
sites per well) subfolders
- Within the site folder (e.g.
A01-1
) there are five csv filesCells.csv
- Each row are measurements of one cell
Cytoplasm.csv
- Another object similar to Cells.csv
Nuclei.csv
- Another object similar to Cells.csv
- These three object files can be concatenated
by column
Objects.csv
Experiment.csv
- Stores metadata for the CellProfiler run, including the CellProfiler pipeline itself
Image.csv
- Inside this
- Within each
- Within each
backend
- also includesbatch
nestingbatch
nestingplate
nesting - stores summaries of each plate (all .csv files also have .gct formats (for input into Morpheus)<plate_id>.sqlite
- inner join of all objects in a well, and then stacked (so all data for each well in a single plate)<plate_id>.csv
- per well means for each well on the plate<plate_id>.augmented.csv
- same as .csv except it includes the metadata<plate_id>._normalized.csv
- some z scored version of augmented<plate_id>._normalized_variable_selected.csv
- across all the plates in the batch- Three feature selection steps
- Variance threshold
- Correlation threshold (decorrelate feature set)
- Replicate correlation filter (>0.6)
- Three feature selection steps
parameters
- same structure asbackend
but with metadata results (e.g. the features selected in variable selection)software
- This is where the project's github repository lives.
- The scripts in the handbook assume that this be named as the same name as the Project folder. To rename it, pay careful attention to paths when executing the commands in the handbook.
├── 2016_04_01_a549_48hr_batch1
│ ├── illum
│ │ └── SQ00015167
│ │ ├── SQ00015167_IllumAGP.npy
│ │ ├── SQ00015167_IllumDNA.npy
│ │ ├── SQ00015167_IllumER.npy
│ │ ├── SQ00015167_IllumMito.npy
│ │ └── SQ00015167_IllumRNA.npy
│ └── images
│ └── SQ00015167__2016-04-21T03_34_00-Measurement1
│ ├── Assaylayout
│ ├── FFC_Profile
│ └── Images
│ ├── r01c01f01p01-ch1sk1fk1fl1.tiff
│ ├── r01c01f01p01-ch2sk1fk1fl1.tiff
│ ├── r01c01f01p01-ch3sk1fk1fl1.tiff
│ ├── r01c01f01p01-ch4sk1fk1fl1.tiff
│ └── r01c01f01p01-ch5sk1fk1fl1.tiff
└── workspace
├── audit
│ └── 2016_04_01_a549_48hr_batch1
├── analysis
│ └── 2016_04_01_a549_48hr_batch1
│ └── SQ00015167
│ └── analysis
│ └── A01-1
│ ├── Cells.csv
│ ├── Cytoplasm.csv
│ ├── Experiment.csv
│ ├── Image.csv
│ ├── Nuclei.csv
│ └── outlines
│ └── SQ00015167
│ ├── A01_s1--cell_outlines.png
│ └── A01_s1--nuclei_outlines.png
├── backend
│ └── 2016_04_01_a549_48hr_batch1
│ └── SQ00015167
│ ├── SQ00015167.csv
│ └── SQ00015167.sqlite
├── images
│ └── 2016_04_01_a549_48hr_batch1 -> /home/ubuntu/bucket/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/2016_04_01_a549_48hr_batch1/images/
├── load_data_csv
│ └── 2016_04_01_a549_48hr_batch1
│ └── SQ00015167
│ ├── load_data.csv
│ └── load_data_with_illum.csv
├── log
│ ├── create_csv_from_xml
│ └── collate
├── metadata
│ └── 2016_04_01_a549_48hr_batch1
│ ├── barcode_platemap.csv
│ └── platemap
│ └── C-7161-01-LM6-006.txt
├── pipelines
├── status
└── software
├── Distributed-CellProfiler
└── pe2loaddata