Quantum Barcoding : Ultra-High Throughput Single Cell Analysis of Proteins and RNAs by Split-pool Synthesis
Quantum Barcoding (QBC) is a method enabling simultaneous, ultra-high throughput single-cell-barcoding, of millions of cells for targeted single cell analysis of proteins and RNAs. This method circumvents the need to isolate single cells by building cell-specific oligo barcodes dynamically within each cell. Cell-specific codes are added to each tagged molecule within cells.
This analysis workflow reads in paired-end fastq files of a sequenced library built using of QBC_v1.0 barcodes, generates a merged fasta file using FLASH and seqtk, processes it through QBC-parse_v1.0 , filters and normalizes to generate a final output in FCS 4.0 standard Flow Cytometry format.
QBC-parse_v1.0 - Deduplication based on unique molecular identifiers (UMI) – were performed using the QBC-parse_v1.0 which allows alignment of sequences with one mismatch.
The QBC algorithm sequentially: a) detects barcode via alignments, b) corrects barcode by efficiently comparing barcodes to a whitelist, c) deduplicates based on UMI, d) evaluates reads for chimera filtering (check for evidence of PCR-based cross-over), e) filters reads for underrepresented/artificially created cells and f)transforms sequences into table of cells and markers.
The data is further normalized as follows: expression values Ei,j for marker i in cell j were calculated by dividing unique read counts for marker i by the sum of the marker counts in cell j, to normalize for differences in coverage. The output is a corrected expression matrix which is then used as input for fcs conversion.
For more information contact us at [email protected]
This is a driver bash script to process the fasta files generated by QBC assay. The output of the script is an fcs file which can be uploaded to standard flow-cytometry analysis software for further analysis.
Linux.
-
Java: Install java version "1.8.0_92"
-
Miniconda: Install Miniconda3
-
Python3: Install Python 3.6.0
git clone https://github.com/bioinform/QBC_Single_Cell_Analysis_NGS.git
cd QBC_Single_Cell_Analysis_NGS
cd apps/parser/
bash linux_build.sh #This command builds the parser.
cd ../../
Now using your favorite editor update the custom_job.txt
file with the location of read1 and read2 fastq files, qdata folder and output folder.
Parameter | Explanation |
---|---|
read1 |
Folder location for read1 fastq file. |
read2 |
Folder location for read2 fastq file. |
qdata_folder |
Folder location for qdata folder containing oligos.txt, AHCA_Codes.txt, SC_Codes.txt, Singlet_settings.txt files. Please refer to example/qdata folder in this repo for the format of these files. |
output_folder |
Folder location for storing output files. |
To generate the processed output from the raw fastq files type the following command at your terminal and press enter.
nohup bash submit_bash.sh custom_job.txt &
The bash script generates an output folder with the following top-level folder structure:
out_folder/
└── Exp180G_S1_L001
├── flash_output
│ ├── Exp180G_S1_L001.extendedFrags.fastq
│ ├── Exp180G_S1_L001.hist
│ ├── Exp180G_S1_L001.histogram
│ ├── Exp180G_S1_L001.notCombined_1.fastq
│ └── Exp180G_S1_L001.notCombined_2.fastq
├── seqtk_output
| └── Exp180G_S1_L001.extendedFrags.fasta
├── parser_output
│ ├── Exp180G_S1_L001_LStr_RC(-).bad
│ ├── Exp180G_S1_L001_LStr_RC(-).byFCS
│ ├── Exp180G_S1_L001_LStr_RC(-).crossoverFilter
│ ├── Exp180G_S1_L001_LStr_RC(-).junk
│ ├── Exp180G_S1_L001_LStr_RC(-).metrics
│ └── Exp180G_S1_L001_LStr.Statistics
├── rmsinglets_output
│ ├── Exp180G_S1_L001_LStr_RC(-)_10_5_5_1.JITTERED_forFCS
│ ├── Exp180G_S1_L001_LStr_RC(-)_10_5_5_1.tab_delimited
│ ├── Exp180G_S1_L001_LStr_RC(-)_10_5_5_1.UNJITTERED_forFCS
│ └── Summary.txt
├── fcs_output
│ ├── Exp180G_S1_L001_LStr_RC(-)_10_5_5_1_JITTERED.fcs
│ └── Exp180G_S1_L001_LStr_RC(-)_10_5_5_1_UNJITTERED.fcs
└── norm_fcs_output
└── Exp180G_S1_L001_LStr_RC(-)_10_5_5_1_normalized_filtered_Chi2Pval_1.0_jittered0.5.fcs
Exp180G_S1_L001.extendedFrags.fastq
- File with merged reads in fastq format.Exp180G_S1_L001.hist
- Numeric histogram of merged read lengths.Exp180G_S1_L001.histogram
- Visual histogram of merged read lengths.Exp180G_S1_L001.notCombined_1.fastq
- Read 1 of mate pairs that were not merged.Exp180G_S1_L001.notCombined_2.fastq
- Read 2 of mate pairs that were not merged.
Exp180G_S1_L001.extendedFrags.fasta
- File with merged reads in fasta format.
Exp180G_S1_L001_LStr_RC(-).bad
- File containing reads where the parser encountered error in at least one sequence element (can be cell-barcodes or anchors or AHCA sequence).Exp180G_S1_L001_LStr_RC(-).byFCS
- A tab-delimited file with marker counts for each of the cells.Exp180G_S1_L001_LStr_RC(-).crossoverFilter
- File containing reads discarded by pcr-crossover filter.Exp180G_S1_L001_LStr_RC(-).junk
- File containing reads where the parser failed to detect all 7 barcode anchors.Exp180G_S1_L001_LStr_RC(-).metrics
- Metrics file containing reads counts filtered by each step in the parsing algorithm.Exp180G_S1_L001_LStr.Statistics
- A summary file containing statistics for cell-barcodes.
Exp180G_S1_L001_LStr_RC(-)_10_5_5_1.tab_delimited
- A tab-delimited file containing cells and their corresponding marker counts.Exp180G_S1_L001_LStr_RC(-)_10_5_5_1.UNJITTERED_forFCS
- A tab-delimited file containing marker counts only (cell-identifier is discarded).Exp180G_S1_L001_LStr_RC(-)_10_5_5_1.JITTERED_forFCS
- A tab-delimited file containing jittered marker counts (cell-identifier column is dicarded).Summary.txt
- A summary file generated by remove-singlets script.
Exp180G_S1_L001_LStr_RC(-)_10_5_5_1_UNJITTERED.fcs
- Un-normalized fcs file containing marker counts for each of the cells.Exp180G_S1_L001_LStr_RC(-)_10_5_5_1_JITTERED.fcs
- Un-normalized fcs file containing jittered marker counts for each of the cells.
Exp180G_S1_L001_LStr_RC(-)_10_5_5_1_forFCS_normalized_filtered_Chi2Pval_1.0_jittered0.5.fcs
- Normalized and Jittered fcs file.