-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2024-12-06]: [Trim ends only] #60
Comments
Hi @vagkaratzas, Thanks for writing with your proposed enhancement. To make sure I understand the request correctly, can you please elaborate on what you want to be trimmed from the ends? Do you specifically want, for example, 50 sites trimmed from both ends, or is there a more quantitative approach you were thinking of? best, Jacob |
Ah, to clarify, I meant clip the gaps at the ends. I mainly use clipkit in my pipeline but for now I added an alternative custom module in python that does that. You can find that here: https://github.com/vagkaratzas/proteinfamilies/blob/dev/bin/clip_ends.py |
Thanks for providing the additional information. So, you are proposing only consecutive sites at the ends get removed or is there a different definition of "ends"? For example, if using the gappy mode would result in sites 0, 2, 3, 4, 5, and 6 being trimmed, the gappy mode of trimming with the --ends_only parameter would only trim site 0, correct? |
Exactly. And if the length is 70 and there are gappy sites at 50, 60, 68, 69, 70, then 68, 69, 70 should be also removed. |
Hi @vagkaratzas, This is a cool idea and should be relatively straightforward to implement. I am currently traveling, and it may have to be a task I tackle in early 2025. Is this something you need immediately, or can it wait? |
I can use mine until then, no worries at all. I can update my pipeline after your update! Thanks |
Could you add a mode where the tool only trims the start and end of the alignment?
There is biological information in the gaps in the middle of the sequences that we would like to keep.
The text was updated successfully, but these errors were encountered: