-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry on protein annotation #54
Comments
Hi @songmj86 , Yes. You can use these 2 commands
This will be equivalent to running George |
Thanks ! |
I missed another question I used the protein fasta files as inputs to run Pharokka Identical protein fasta files are not suitible formats for the input to run Phold ?? Thanks! |
Hi @songmj86 , I am not sure what you mean but this question. Phold will accept amino acid FASTA (aka .faa) files using George |
George, |
Hi @shiraz-shah If you theoretically run the same set of proteins via run or proteins-compare, it should be identical for the foldseek step so this is surprising to me. To confirm it is the foldseek step, do you have the foldseek logs? George |
If Foldseek is using only 1 core, that would certainly explain why it is taking a while - nothing seems wrong with foldseek per se, it is running, and for sure a massive job like you have with 630k proteins means a 78G and counting prefilter file is not unexpected. Are you intentionally only using 1 core with George |
Note if you want to reduce the file size generated with the pre filter, change Honestly, in this case, I'd just wait for it to finish, it should only be a few hours away is my guess. George |
Thanks for this input, George. It's very useful and appreciated. I'll wait and see! The 1 core was not intentional. I think it's the default behavior. But maybe for |
Hi. I have a little fundamental questions What is the reason to run both "phold proteins-predict" and "phold proteins-compare" ? Is it because both tools seem to be used for annotation against dissimilar databases ? Thanks |
it it because predict requires GPU compute and compare CPU (ideally with many cores). In a cluster environment, it is much more resource efficient the split phold into 2 commands sending predict to the GPU partition and compare to the CPU. George |
Description
Hi. I I am trying to annotate viral proteins
I do not have input format (genbank file) as the output generated from Pharokka resulted from the command "pharokka_proteins.py "
Is there any way to use Pharokka output obatined from "pharokka_proteins.py " as input for phold ?
Thanks !
The text was updated successfully, but these errors were encountered: