Skip to content

Latest commit

 

History

History
155 lines (108 loc) · 7.07 KB

README.md

File metadata and controls

155 lines (108 loc) · 7.07 KB

nf-cellranger-tools

Collection of Nextflow tools for using CellRanger

Tools

mkfastq

Docs: Generating FASTQs with cellranger mkfastq

Parameters

  • bcl_run_folder: Folder containing the Illumina sequencer's base call files
  • samplesheet: Sample sheet defining data structure, either a Illumina Experiment Manager sample sheet or a simple three-column CSV
  • output: Path for output files
  • input_type: Define the format of the sample sheet input, either "samplesheet" or "csv" (default: "samplesheet")
  • filter_dual_index: Optional. Only demultiplex samples identified by i7/i5 dual-indices (e.g., SI-TT-A6), ignoring single-index samples. Single-index samples will not be demultiplexed
  • filter_single_index: Optional. Only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples. Dual-indexed samples will not be demultiplexed
  • lanes: Comma-delimited series of lanes to demultiplex (e.g. 1,3). Use this if you have a sample sheet for an entire flow cell but only want to generate a few lanes for further 10x Genomics analysis. (optional)
  • use_bases_mask Same meaning as for bcl2fastq. Use to clip extra bases off a read if you ran extra cycles for QC.
  • delete_undetermined Delete the Undetermined FASTQs generated by bcl2fastq. Useful if you are demultiplexing a small number of samples from a large flow cell.
  • barcode_mismatches Same meaning as for bcl2fastq. Use this option to change the number of allowed mismatches per index adapter (0, 1, 2). Default: 1.
  • project Custom project name, to override the sample sheet or to use in conjunction with the --csv argument.

count

Docs: Single-Library Analysis with cellranger count

Parameters

  • output: Path for output files
  • sample_whitelist: Optional text file used to specify the subset of samples to process, one per line (no header)
  • fastq_dir: Directory containing all FASTQ files
  • transcriptome_dir: Directory containing transcriptome reference files (see below)

References

The default transcriptome reference in the workflow is:

  • /shared/biodata/reference/10x/refdata-gex-GRCh38-2020-A

VDJ

Docs: Analysis of V(D)J data

Parameters

  • output: Path for output files
  • sample_whitelist: Optional text file used to specify the subset of samples to process, one per line (no header)
  • fastq_dir: Directory containing all FASTQ files
  • vdj_dir: Directory containing VDJ reference files (see below)

References

The default VDJ reference in the workflow is:

  • /shared/biodata/reference/10x/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0

multi

To analyze samples which have been prepared with multiple complementary methodologies, the flexible cellranger multi analysis module is used.

Sample Grouping

To account for a wide variety of experimental designs, the multi workflow in this repository uses a simple input format which lists each of the different libraries in a single table alongside the methodology which was used to prepare it.

For example, a single sample (sc5p_v2_hs_B_1k) may have been prepared in two parallel methods, both with 5' gene expression (sc5p_v2_hs_B_1k_5gex) and V(D)J (sc5p_v2_hs_B_1k_b). The sample grouping table describing this experimental design would be:

sample grouping feature_types
sc5p_v2_hs_B_1k_5gex sc5p_v2_hs_B_1k Gene Expression
sc5p_v2_hs_B_1k_b sc5p_v2_hs_B_1k VDJ

Allowed values for feature_types are (ref):

  • Gene Expression
  • VDJ
  • VDJ-T
  • VDJ-B
  • Antibody Capture (see below)
  • CRISPR Guide Capture (see below)
  • Multiplexing Capture (see below)

Note that the sample grouping table must be provided in CSV format.

Antibody or CRISPR Guide Capture

When analyzing a sample using antibody capture or CRISPR guide capture, a feature reference CSV must be provided using the formation described here.

Note that either antibody capture or CRISPR guide capture may be analyzed, but not both at the same time.

Multiplexing Capture

Optionally, if CMOs were used to multiplex samples in a single GEM the column sample would be omitted from the sample grouping table and a second table would be provided indicating the mapping of samples to Cell Multiplexing oligo IDs in this library. If multiple CMOs were used for a sample, separate IDs with a pipe (e.g., CMO301|CMO302).

An example CMO mapping table would look like:

sample_id cmo_ids
Jurkat CMO301
Raji CMO302

When using multiplexing capture, the sample grouping table must contain a library with the feature_types annotated as Multiplexing Capture, e.g.

library feature_types
sc5p_v2_hs_B_1k_5gex Gene Expression
sc5p_v2_hs_B_1k_mux Multiplexing Capture

Parameters

  • output: Path for output files
  • grouping: Path to sample grouping CSV
  • fastq_dir: Directory containing all FASTQ files
  • transcriptome_dir: Directory containing transcriptome reference files
  • vdj_dir: Directory containing VDJ reference files
  • multiplexing: Path to multiplexing capture table (optional)
  • feature_csv: Feature Reference CSV used for either Antibody Capture or CRISPR Guide Capture (optional)

Note that both transcriptome_dir and vdj_dir must always be specified, although the contents will only be accessed if the corresponding sample type is provided

Resource Allocation

The amount of CPUs and memory available to each task can be customized with the parameters -process.cpus (default: 16) and -process.memory (default: 64.GB)

Testing

The workflows in this repository may be tested by downloading example datasets hosted by 10X and running the appropriate analyses locally.

To download the example datasets and all necessary reference data, navigate to test/ and run download_inputs.sh.

Before running the tests, make sure that Nextflow is installed on your host system.

NOTE: The CellRanger utility is sourced by default from an EasyBuild module which is assumed to be available on the host system (using beforeScript = "ml CellRanger/6.1.1"). If CellRanger is available from another source, it can be loaded for the testing suite by adding an appropriate configuration file (nextflow.config) to the working directory used for testing.

To run tests, navigate to test/ and run bash run.sh.