Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using MotifSeq coordinates within single reads to segment fast5 file at those positions. #44

Open
gallardo-seq opened this issue Jan 18, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@gallardo-seq
Copy link

gallardo-seq commented Jan 18, 2021

Thanks for developing this great tool. This is an enhancement/question type issue. We have some CCS-type reads that contain a repetitive unit that we can search for using MotifSeq. My question is whether SquiggleKit can output the positions of the repetitive unit within single reads, and then use these coordinates to segment each fast5 file (kind of like Porechop, but at the signals level). Our ultimate goal is to do pair consensus decoding with Bonito specifically, or facilitate multidimensional basecalling in general. Does SquiggleKit already have similar functionality that I'm perhaps missing by in the documentation?

@Psy-Fer
Copy link
Owner

Psy-Fer commented Jan 18, 2021

Hey,

So you want the base positions of what you have found in the signal? Or the signal positions of what you found in the basecall?

If either of those is what you want, you are in luck as we have an upgrade being worked on at the moment for doing this with a new library I built along with Hasindu in our lab.

Happy to make this one of the use case examples.

I'll talk to the people involved this week week and see if I can get it moved forward.

Currently motifseq will give you the signal positions where it finds something. Then you would have to cut the signal at those sites from the array. Probably not as useful as the new method we have.

James

@Psy-Fer Psy-Fer self-assigned this Jan 18, 2021
@Psy-Fer Psy-Fer added enhancement New feature or request question Further information is requested labels Jan 18, 2021
@gallardo-seq
Copy link
Author

gallardo-seq commented Jan 19, 2021

Thanks for getting back to me so quickly. To clarify, I have a list of positions for each basecall/fastq file that I want to use to segment the originating signal/fast5. Specifically, each read is a concatemer containing a single sequence that is repeated over and over (like CCS in PacBio). We already use these repetitions to obtain error-corrected reads in the base space, but with the release of a pair-consensus decoding option for bonito (and multi-dimensional basecalling in the works for ONT in general), I think using these repetitive units for signal space error correction would be a timely development.

That is quite exciting about the upgrade that you have in the works. From your description it sounds like it'd be highly complementary to what we are looking for, I have some test fast5 and fastq files that I'd be happy to share as a case use example or for development purposes (though I'd be happy to coordinate over e-mail or another more suitable medium). Let me know if the people involved in your collaboration are interested in moving forward on this angle.

@Psy-Fer
Copy link
Owner

Psy-Fer commented Jan 19, 2021

Hey,

That actually sounds rad.

Wanna send me an email at j.ferguson[at]garvan.org.au ?

I think this would be worth including in the development to ensure we deliver in a way that would make something like this works.
What you need sounds exactly like what we have made, so I think this could work out well.

Talk soon.

James

@gallardo-seq
Copy link
Author

Hi, I sent you an e-mail to get the ball rolling. Talk soon. CG

@Psy-Fer
Copy link
Owner

Psy-Fer commented Jan 21, 2021

Hey,

Yep, i got it. I have been talking with the relevant people. Looks like we will be going ahead. I'll be in touch soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants