Collection of Nextflow tools for using CellRanger
Docs: Generating FASTQs with cellranger mkfastq
bcl_run_folder
: Folder containing the Illumina sequencer's base call filessamplesheet
: Sample sheet defining data structure, either a Illumina Experiment Manager sample sheet or a simple three-column CSVoutput
: Path for output filesinput_type
: Define the format of the sample sheet input, either "samplesheet" or "csv" (default: "samplesheet")filter_dual_index
: Optional. Only demultiplex samples identified by i7/i5 dual-indices (e.g., SI-TT-A6), ignoring single-index samples. Single-index samples will not be demultiplexedfilter_single_index
: Optional. Only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples. Dual-indexed samples will not be demultiplexedlanes
: Comma-delimited series of lanes to demultiplex (e.g. 1,3). Use this if you have a sample sheet for an entire flow cell but only want to generate a few lanes for further 10x Genomics analysis. (optional)use_bases_mask
Same meaning as for bcl2fastq. Use to clip extra bases off a read if you ran extra cycles for QC.delete_undetermined
Delete the Undetermined FASTQs generated by bcl2fastq. Useful if you are demultiplexing a small number of samples from a large flow cell.barcode_mismatches
Same meaning as for bcl2fastq. Use this option to change the number of allowed mismatches per index adapter (0, 1, 2). Default: 1.project
Custom project name, to override the sample sheet or to use in conjunction with the --csv argument.
Docs: Single-Library Analysis with cellranger count
output
: Path for output filessample_whitelist
: Optional text file used to specify the subset of samples to process, one per line (no header)fastq_dir
: Directory containing all FASTQ filestranscriptome_dir
: Directory containing transcriptome reference files (see below)
The default transcriptome reference in the workflow is:
/shared/biodata/reference/10x/refdata-gex-GRCh38-2020-A
Docs: Analysis of V(D)J data
output
: Path for output filessample_whitelist
: Optional text file used to specify the subset of samples to process, one per line (no header)fastq_dir
: Directory containing all FASTQ filesvdj_dir
: Directory containing VDJ reference files (see below)
The default VDJ reference in the workflow is:
/shared/biodata/reference/10x/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0
To analyze samples which have been prepared with multiple complementary methodologies, the flexible
cellranger multi
analysis module
is used.
To account for a wide variety of experimental designs, the multi
workflow in this repository
uses a simple input format which lists each of the different libraries in a single table
alongside the methodology which was used to prepare it.
For example, a single sample (sc5p_v2_hs_B_1k
) may have been prepared in two
parallel methods, both with 5' gene expression (sc5p_v2_hs_B_1k_5gex
) and V(D)J
(sc5p_v2_hs_B_1k_b
).
The sample grouping table describing this experimental design would be:
sample | grouping | feature_types |
---|---|---|
sc5p_v2_hs_B_1k_5gex | sc5p_v2_hs_B_1k | Gene Expression |
sc5p_v2_hs_B_1k_b | sc5p_v2_hs_B_1k | VDJ |
Allowed values for feature_types
are (ref):
Gene Expression
VDJ
VDJ-T
VDJ-B
Antibody Capture
(see below)CRISPR Guide Capture
(see below)Multiplexing Capture
(see below)
Note that the sample grouping table must be provided in CSV format.
When analyzing a sample using antibody capture or CRISPR guide capture, a feature reference CSV must be provided using the formation described here.
Note that either antibody capture or CRISPR guide capture may be analyzed, but not both at the same time.
Optionally, if CMOs were used to multiplex samples in a single GEM the column sample
would be omitted from the sample grouping table and a second table would be provided
indicating the mapping of samples to Cell Multiplexing oligo IDs in this library.
If multiple CMOs were used for a sample, separate IDs with a pipe (e.g., CMO301|CMO302).
An example CMO mapping table would look like:
sample_id | cmo_ids |
---|---|
Jurkat | CMO301 |
Raji | CMO302 |
When using multiplexing capture, the sample grouping table must contain a
library with the feature_types
annotated as Multiplexing Capture
, e.g.
library | feature_types |
---|---|
sc5p_v2_hs_B_1k_5gex | Gene Expression |
sc5p_v2_hs_B_1k_mux | Multiplexing Capture |
output
: Path for output filesgrouping
: Path to sample grouping CSVfastq_dir
: Directory containing all FASTQ filestranscriptome_dir
: Directory containing transcriptome reference filesvdj_dir
: Directory containing VDJ reference filesmultiplexing
: Path to multiplexing capture table (optional)feature_csv
: Feature Reference CSV used for either Antibody Capture or CRISPR Guide Capture (optional)
Note that both
transcriptome_dir
andvdj_dir
must always be specified, although the contents will only be accessed if the corresponding sample type is provided
The amount of CPUs and memory available to each task can be customized with the parameters -process.cpus
(default: 16) and -process.memory
(default: 64.GB
)
The workflows in this repository may be tested by downloading example datasets hosted by 10X and running the appropriate analyses locally.
To download the example datasets and all necessary reference data, navigate to test/
and run download_inputs.sh
.
Before running the tests, make sure that Nextflow is installed on your host system.
NOTE: The CellRanger utility is sourced by default from an EasyBuild module which
is assumed to be available on the host system (using beforeScript = "ml CellRanger/6.1.1"
).
If CellRanger is available from another source, it can be loaded for the testing suite
by adding an appropriate configuration file (nextflow.config
) to the working
directory used for testing.
To run tests, navigate to test/
and run bash run.sh
.