Simple sciCUT&Tag demultiplexing.
Installation within a conda environment or virtualenv is recomended. In an active enviroment with python3 installed, clone the git repository and install with pip:
pip install .
This will install dependencies such as Biopython and put the
sciCTextract
command on your path.
Demultiplexing requires four input Fastq files following Illumina naming conventions, including read headers of the form:
@VH00319:342:AACKYJMM5:1:1101:31410:1000 1:N:0:0
On multi-lane flowcells, reads should not be split by lane, unless needed to process lanes independently (to support XP worflows for example). A demultiplexing run requires exactly four Fastq files for paired sequence reads (_R1 & _R2) and paired index reads (_I1 & _I2).
The process also requires two barcode tables in comma-sepearated value format, one for Tn5 barcodes that will define the prefixes of the samples names and one for the Primer barcodes that will defined the suffixes.
The Tn5 barcode table should have this form at minimum:
Sample Name | Tn5_s7 | Tn5_s7_seq | Tn5_s5 | Tn5_s5_seq |
---|---|---|---|---|
Hs_H3K27ac | P7_i7_1 | ATTACTCG | P5_i5_1 | TATAGCCT |
Hs_H3K27ac | P7_i7_1 | ATTACTCG | P5_i5_2 | ATAGAGGC |
The Primer barcode table should have this form at minimum:
i7_index_seq | i5_index_seq | i7_index_id | i5_index_id | ID |
---|---|---|---|---|
GGACTCCT | TAGATCGC | P7_5 | P5_1 | 10pM |
GGACTCCT | CTCTCTAT | P7_5 | P5_2 | 10pM |
Note that barcodes should always be specified in forward strand ("Workflow A") orientation. This allows the same barcode tables to be used with different types of Illumina instruments. All barcodes are currently required to be 8nt.
With four Fastq files in hand and two barcode tables defined,
create an output directory (e.g., mkdir fastq_out
) and launch
demultiplexing, for example:
sciCTextract \
--outdir fastq_out \
--Tn5_Barcode Tn5_Barcode_Annotation.csv \
--Primer_Barcode Primer_Barcode_Annotation.csv \
Undetermined_S0_R1_001.fastq.gz \
Undetermined_S0_R2_001.fastq.gz \
Undetermined_S0_I1_001.fastq.gz \
Undetermined_S0_I2_001.fastq.gz
Note that our current typical use is to run on the Illumina NextSeq 2000.
Default settings should work for NextSeq 1000/2000, NovaSeq 6000 (v1.5 or more
recent). For instruments that use forward-strand workflows
(MiSeq, HiSeq, MiniSeq Rapid, etc.) we provide the --forward-mode
option
to override the default reverse-complementing of the i5 barcode reads.
Output consists of one pair of gzip compressed Fastq files per sample. The read headers are re-written to include the error-corrected barcode sequences and to be compact while retaining enough information to unambihguously identify each source read. For example:
@HMH53BCX3:1:1105:11433:2512_GCGTTAAA_GTGTATCG_AGCGATAG_CAGGACGT 1:N:0:0