sciCTextract

Simple sciCUT&Tag demultiplexing.

Installation

Installation within a conda environment or virtualenv is recomended. In an active enviroment with python3 installed, clone the git repository and install with pip:

pip install .

This will install dependencies such as Biopython and put the sciCTextract command on your path.

Input Files

Demultiplexing requires four input Fastq files following Illumina naming conventions, including read headers of the form:

@VH00319:342:AACKYJMM5:1:1101:31410:1000 1:N:0:0

On multi-lane flowcells, reads should not be split by lane, unless needed to process lanes independently (to support XP worflows for example). A demultiplexing run requires exactly four Fastq files for paired sequence reads (_R1 & _R2) and paired index reads (_I1 & _I2).

The process also requires two barcode tables in comma-sepearated value format, one for Tn5 barcodes that will define the prefixes of the samples names and one for the Primer barcodes that will defined the suffixes.

The Tn5 barcode table should have this form at minimum:

Sample Name	Tn5_s7	Tn5_s7_seq	Tn5_s5	Tn5_s5_seq
Hs_H3K27ac	P7_i7_1	ATTACTCG	P5_i5_1	TATAGCCT
Hs_H3K27ac	P7_i7_1	ATTACTCG	P5_i5_2	ATAGAGGC

The Primer barcode table should have this form at minimum:

i7_index_seq	i5_index_seq	i7_index_id	i5_index_id	ID
GGACTCCT	TAGATCGC	P7_5	P5_1	10pM
GGACTCCT	CTCTCTAT	P7_5	P5_2	10pM

Note that barcodes should always be specified in forward strand ("Workflow A") orientation. This allows the same barcode tables to be used with different types of Illumina instruments. All barcodes are currently required to be 8nt.

Running

With four Fastq files in hand and two barcode tables defined, create an output directory (e.g., mkdir fastq_out) and launch demultiplexing, for example:

sciCTextract \
    --outdir fastq_out \
    --Tn5_Barcode Tn5_Barcode_Annotation.csv \
    --Primer_Barcode Primer_Barcode_Annotation.csv \
    Undetermined_S0_R1_001.fastq.gz \
    Undetermined_S0_R2_001.fastq.gz \
    Undetermined_S0_I1_001.fastq.gz \
    Undetermined_S0_I2_001.fastq.gz

Note that our current typical use is to run on the Illumina NextSeq 2000. Default settings should work for NextSeq 1000/2000, NovaSeq 6000 (v1.5 or more recent). For instruments that use forward-strand workflows (MiSeq, HiSeq, MiniSeq Rapid, etc.) we provide the --forward-mode option to override the default reverse-complementing of the i5 barcode reads.

Output

Output consists of one pair of gzip compressed Fastq files per sample. The read headers are re-written to include the error-corrected barcode sequences and to be compact while retaining enough information to unambihguously identify each source read. For example:

@HMH53BCX3:1:1105:11433:2512_GCGTTAAA_GTGTATCG_AGCGATAG_CAGGACGT 1:N:0:0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
sciCTextract		sciCTextract
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sciCTextract

Installation

Input Files

Running

Output

About

Releases

Packages

Languages

License

jacob-greene/sciCTextract

Folders and files

Latest commit

History

Repository files navigation

sciCTextract

Installation

Input Files

Running

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages