Add support for targeted sequencing #28

iqbal-lab · 2018-10-17T07:44:54Z

This has been sitting around as a possibility for some time; we should just do it.
I propose

Chuck out species id
Update genome size to be lengths of genes in our targets (allow user specification)
Otherwise should run as normal

mbhall88 · 2022-08-26T06:20:18Z

Do we actually use the genome size? I can't seem to find any reference to it in the code.

iqbal-lab · 2022-08-26T07:51:44Z

Only to estimate depth somewhere

iqbal-lab · 2022-08-26T07:53:44Z

This issue should be closed, Mykrobe already supports amplicon sequencing . There's a force option to skip species id, and it works on amplicons afaik

iqbal-lab · 2022-08-26T07:59:35Z

I'll update properly on Tue

mbhall88 · 2022-08-27T23:15:34Z

Okay, awesome.

@martinghunt do you know where in the code genome size impacts the depth esimation? I can't seem to see it anywhere...

mbhall88 · 2022-09-27T04:23:48Z

Okay, have realised targetted/amplicon sequencing isn't really supported.

I have been running mykrobe on some amplicon data where expected median depth should be in the thousands, but am getting estimated median depth of significantly less than that (single digits or low hundreds).

I'm trying to parse the current method for estimating depth, but it is pretty convoluted...

I'll keep digging away to see if I can find the best place to handle this.

My thoughts are to have an option like --amplicon which optionally takes a fasta reference. If no fasta reference is passed, we use the size of the genes in mykrobe's panel, otherwise we use the sum of sequence lengths in the provided fasta.

Thoughts?

iqbal-lab · 2022-09-27T20:59:54Z

Sorry @mbhall88 was busy today, will think and reply tomorrow

mbhall88 · 2022-10-03T06:15:52Z

I've been playing with this locally. It's tricky. It's also extra tricky because the amplicon data I'm working with isn't amplifying entire genes. So lots of the variants have depth of like 0 or 1 and the rest have like 100,000. This ends up skewing the median depth calculation....

Any other thoughts on how to better estimate the depth? I mean, a plot of kmer_counts and cutting out the counts of the first peak and then taking the median of the rest sounds reasonable, but not sure how to do this.

iqbal-lab added the enhancement New feature or request label Oct 17, 2018

mbhall88 changed the title ~~Add support for targetted sequencing~~ Add support for targeted sequencing Oct 3, 2022

mbhall88 linked a pull request Oct 7, 2022 that will close this issue

Add support for targeted sequencing #163

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for targeted sequencing #28

Add support for targeted sequencing #28

iqbal-lab commented Oct 17, 2018

mbhall88 commented Aug 26, 2022

iqbal-lab commented Aug 26, 2022

iqbal-lab commented Aug 26, 2022

iqbal-lab commented Aug 26, 2022

mbhall88 commented Aug 27, 2022

mbhall88 commented Sep 27, 2022

iqbal-lab commented Sep 27, 2022

mbhall88 commented Oct 3, 2022

Add support for targeted sequencing #28

Add support for targeted sequencing #28

Comments

iqbal-lab commented Oct 17, 2018

mbhall88 commented Aug 26, 2022

iqbal-lab commented Aug 26, 2022

iqbal-lab commented Aug 26, 2022

iqbal-lab commented Aug 26, 2022

mbhall88 commented Aug 27, 2022

mbhall88 commented Sep 27, 2022

iqbal-lab commented Sep 27, 2022

mbhall88 commented Oct 3, 2022