[Feature Request]: NextGenSeqUtils.deMux(): Seperate error by gene specific region of primer and barcode region of primer #5

dnk8n · 2018-12-05T09:12:37Z

Terminology:

Since I am new to bioinformatics, I am going to go over the terminology as I understand it.

example (dummy data), forward primer:

GATAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
*****------------------------------

where * is the barcode (differentiates samples)
where - is the gene-specific region (selects which region gets amplified)

Currently error is calculated over the entire primer (barcode + gene-specific region).

It would be nice to be able to have the option to enter forward/reverse primers seperated into regions, e.g. instead of:

forwardPrimers = [
    "GATAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT",
    "TCTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT",
    "CATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTT",
    "TGCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT"
]
reversePrimers = [
    "CTGTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "CTGTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "CTGTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "CTGTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
]

we could pass in:

forwardPrimers = [
    ["GATAG", "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT"],
    ["TCTAC", "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT"],
    ["CATCA", "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT"],
    ["TGCAT", "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT"]
]

reversePrimers = [
    ["CTGTC", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],
    ["CTGTC", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],
    ["CTGTC", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],
    ["CTGTC", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"]
]

forwardNames = ["1", "3", "4", "5"]

reverseNames = ["A", "A", "A", "A"]

sequences = [
    "TGCATTTTTTTTTTTTTTTTGGGGGGTTTTTTTTTAAAAATTGGGGGGTTTTTTTTTTTTTTTTTTTTTTGACAG",
    "CTGTCAAAAAAAAAAAAAAAAAAAAAACCCCCCAATTTTTAAAAAAAAACCCCCCAAAAAAAAAAAAAAAATGCA"
]

and get back an output like this:

[
    ["5_A", "TGCATTTTTTTTTTTTTTTTGGGGGGTTTTTTTTTAAAAATTGGGGGGTTTTTTTTTTTTTTTTTTTTTTGACAG", false, [0, -6.0]],
    ["5_A", "TGCATTTTTTTTTTTTTTTTGGGGGGTTTTTTTTTAAAAATTGGGGGGTTTTTTTTTTTTTTTTTTTTTTGACAG", true, [0, -6.0]]
]

This way we can accept a higher error in the gene-specific region, but can still demultiplex according to the barcode error (with a lower tolerance for error).

One step further, it might be nice to just return all error and let the user make decisions based on it, e.g.:

[
    <forward barcode error>,
    <forward gene-specific region error>,
    <reverse barcode error>,
    <reverse gene-specific region error>
]

As I said, pretty new to this. Something I think would be useful. Happy to have a conversation and be put onto alternate solutions.

I am happy to dig into some Julia and work on this if it is decided that this would be a useful feature.

The text was updated successfully, but these errors were encountered:

murrellb · 2018-12-05T13:24:05Z

I agree that allowing different thresholds for barcode vs rest-of-primer is useful, because it lets you tolerate errors in the long barcode region. Note that how I currently work around this issue is I don't include all of the non-barcode part of the primer - I include just enough bases to distinguish FWD and REV, which also lowers the computational cost of the demux step, because it requires shorter alignments.

There are other benefits of matching the entire primer though. The main one is that we often want to trim off the primer itself, which is something I haven't done robustly. I'd recommend considering this together with the multiple-region primer match score, because you need info from the primer match to get the trimming right.

Please jump in. You'll likely want to build it around the IUPAC_nuc_nw_align in demux.jl. - you could do a single alignment, and then compute the scores for the different bits based on that alignment. You'll have to make an arbitrary decision about where to place an insertion that happens between the separated segments of the primer.

dnk8n changed the title ~~[Feature Request]: Seperate error by gene specific region of primer and barcode region of primer~~ [Feature Request]: NextGenSeqUtils.deMux(): Seperate error by gene specific region of primer and barcode region of primer Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: NextGenSeqUtils.deMux(): Seperate error by gene specific region of primer and barcode region of primer #5

[Feature Request]: NextGenSeqUtils.deMux(): Seperate error by gene specific region of primer and barcode region of primer #5

dnk8n commented Dec 5, 2018

murrellb commented Dec 5, 2018

[Feature Request]: NextGenSeqUtils.deMux(): Seperate error by gene specific region of primer and barcode region of primer #5

[Feature Request]: NextGenSeqUtils.deMux(): Seperate error by gene specific region of primer and barcode region of primer #5

Comments

dnk8n commented Dec 5, 2018

murrellb commented Dec 5, 2018