-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using MotifSeq coordinates within single reads to segment fast5 file at those positions. #44
Comments
Hey, So you want the base positions of what you have found in the signal? Or the signal positions of what you found in the basecall? If either of those is what you want, you are in luck as we have an upgrade being worked on at the moment for doing this with a new library I built along with Hasindu in our lab. Happy to make this one of the use case examples. I'll talk to the people involved this week week and see if I can get it moved forward. Currently motifseq will give you the signal positions where it finds something. Then you would have to cut the signal at those sites from the array. Probably not as useful as the new method we have. James |
Thanks for getting back to me so quickly. To clarify, I have a list of positions for each basecall/fastq file that I want to use to segment the originating signal/fast5. Specifically, each read is a concatemer containing a single sequence that is repeated over and over (like CCS in PacBio). We already use these repetitions to obtain error-corrected reads in the base space, but with the release of a pair-consensus decoding option for bonito (and multi-dimensional basecalling in the works for ONT in general), I think using these repetitive units for signal space error correction would be a timely development. That is quite exciting about the upgrade that you have in the works. From your description it sounds like it'd be highly complementary to what we are looking for, I have some test fast5 and fastq files that I'd be happy to share as a case use example or for development purposes (though I'd be happy to coordinate over e-mail or another more suitable medium). Let me know if the people involved in your collaboration are interested in moving forward on this angle. |
Hey, That actually sounds rad. Wanna send me an email at j.ferguson[at]garvan.org.au ? I think this would be worth including in the development to ensure we deliver in a way that would make something like this works. Talk soon. James |
Hi, I sent you an e-mail to get the ball rolling. Talk soon. CG |
Hey, Yep, i got it. I have been talking with the relevant people. Looks like we will be going ahead. I'll be in touch soon. |
Thanks for developing this great tool. This is an enhancement/question type issue. We have some CCS-type reads that contain a repetitive unit that we can search for using MotifSeq. My question is whether SquiggleKit can output the positions of the repetitive unit within single reads, and then use these coordinates to segment each fast5 file (kind of like Porechop, but at the signals level). Our ultimate goal is to do pair consensus decoding with Bonito specifically, or facilitate multidimensional basecalling in general. Does SquiggleKit already have similar functionality that I'm perhaps missing by in the documentation?
The text was updated successfully, but these errors were encountered: