Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2024-12-06]: [Trim ends only] #60

Open
vagkaratzas opened this issue Dec 6, 2024 · 6 comments
Open

[2024-12-06]: [Trim ends only] #60

vagkaratzas opened this issue Dec 6, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@vagkaratzas
Copy link

Could you add a mode where the tool only trims the start and end of the alignment?
There is biological information in the gaps in the middle of the sequences that we would like to keep.

@vagkaratzas vagkaratzas added the enhancement New feature or request label Dec 6, 2024
@JLSteenwyk
Copy link
Owner

Hi @vagkaratzas,

Thanks for writing with your proposed enhancement.

To make sure I understand the request correctly, can you please elaborate on what you want to be trimmed from the ends? Do you specifically want, for example, 50 sites trimmed from both ends, or is there a more quantitative approach you were thinking of?

best,

Jacob

@vagkaratzas
Copy link
Author

Ah, to clarify, I meant clip the gaps at the ends. I mainly use clipkit in my pipeline but for now I added an alternative custom module in python that does that. You can find that here: https://github.com/vagkaratzas/proteinfamilies/blob/dev/bin/clip_ends.py
Would be nice to have all the options in one tool though ;)
For example, would be nice to combine an --ends_only parameter with -m gappy --gaps

@JLSteenwyk
Copy link
Owner

Thanks for providing the additional information. So, you are proposing only consecutive sites at the ends get removed or is there a different definition of "ends"?

For example, if using the gappy mode would result in sites 0, 2, 3, 4, 5, and 6 being trimmed, the gappy mode of trimming with the --ends_only parameter would only trim site 0, correct?

@vagkaratzas
Copy link
Author

Exactly. And if the length is 70 and there are gappy sites at 50, 60, 68, 69, 70, then 68, 69, 70 should be also removed.

@JLSteenwyk
Copy link
Owner

Hi @vagkaratzas,

This is a cool idea and should be relatively straightforward to implement.

I am currently traveling, and it may have to be a task I tackle in early 2025. Is this something you need immediately, or can it wait?

@vagkaratzas
Copy link
Author

I can use mine until then, no worries at all. I can update my pipeline after your update! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants