-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seqtk telo -m works partially with 8mer wasp telomere #201
Comments
At least for human, (TTAGGG)n only occurs at the 3'-end of a chromosome. I assume |
Just checked the paper. They were using short reads. The inverted motif could be an assembly artifact. It would be more convincing if we have HiFi reads. |
We do have a HiFi assembly made with hifiasm using PacBio long reads. And, as you say, the test text was a very truncated view of the Chr, thanks for pointing out the folly of using it as a reliable test. I found the 8mer from the short read paper but confirmed its existence in our HiFi assembly using grep for the 8mer. Thank you. |
Where are TTATTGGG 8-mers? Are they located on the 3'-ends of contigs, or following (CCCAATAA)n on the 5'-end like the example you have shown? |
At the bottom or very near bottom. I made a 8Mbp test file named tst2 with the CCCAATAA at top TTATTGGG at bottom and revcomp version tst2_rc to flip them and these were the results:
I used seqtk to fold at 120 chars for the file, so first TTATTGGG hit is at 8,402,280 bases in. I'm sure you have more important things, I just thought having tried in this non-standard case that I would pass it along. I can always use a squishy regex with grep to find as I have been doing with the vertebrate 6mer. As an aside, I do hope that telomere awareness can be used in assemblers and scaffolders as an option. We run a telomere script after every assembly or use of yahs and show the records and their telos as TOP, TOP_NEAR, BOTTOM, BOTTOM_NEAR or MIDDLE. Thanks for taking the time. |
Thank you for adding the telo command, it is much faster than the grep_telomere script I had put together.
I was hoping to use this with a wasp assembly to see its telomeres. As this pub mentions https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8977481 wasp telomeres can be from 5 to 9mers. Our assembly contains the 8mer TTATTGGG. Setting this with -m does not find any telos, whereas using the revcomp -m CCCAATAA finds this set but not the TTATTGGG at the bottom of the test file. I suspect it is the 6mer hash data structure. Thought I would mention it, though not sure how much effort you think is worth extending the -m usage scenarios.
best,
Jim Henderson
Test text I have been using:
The text was updated successfully, but these errors were encountered: