Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contig with no annotation result but still be identified as virus #93

Open
quliping opened this issue Apr 11, 2024 · 2 comments
Open

Contig with no annotation result but still be identified as virus #93

quliping opened this issue Apr 11, 2024 · 2 comments

Comments

@quliping
Copy link

quliping commented Apr 11, 2024

Hello, VIBRANT is a good software and very usefull in my work. I know VIBRANT identify virus based on a annotation method. However, I found many potential virus identified by VIBRANT have no annotation information. How are they recognized as viruses? For example, the contig 'GT1_16407' only have 4 hypothetical proteins but still be assgined as a virus. By the way, I noticed that the database version used by VIBRANT is very old, is there any plan to update the database of VIBRANT? Or how to update the database by ourself?

image

image

Sincerely,

Liping Qu

@ZihengWu
Copy link

ZihengWu commented Jun 29, 2024

Hello Liping, long time no see.

To provide you with information, I’ve tried updating the databases myself to input VIBRANT. Specifically, I updated the VOG database from v94 to v224, the KEGG database to the version of 2024-04-01, and the Pfam database to v37. Upon comparing the results before and after updating the databases, I noticed a significant difference in virus annotation: the number of contigs identified as viruses decreased.

I’m also curious to know if the esteemed authors have plans to update the databases and if the results of such updates can be equally reliable.

Best wishes.
Ziheng Wu

@KrisKieft
Copy link
Member

Hi,

There are no plans to update the database. It was a slight oversight on development of the tool which relies on standardization (exact matching) of the database HMMs or protein cluster IDs. A big issue is VOG, which releases completely different database IDs per protein cluster for each version, which is why you'd end up with drastically different results with a different VOG database. Same for Pfam due to suffix verisons of each ID but maybe not as big of an issue for KEGG.

That is very curious that your GT1_16407 would be identified as a virus. Are all 4 proteins completely unannotated or are they annotated as "hypothetical"? My guess on the latter is that if you do happens to have hypothetical annotations from VOG they can still be classified as virus-like according to v-score and lead to a designation as virus. If all 4 protein are completely unannotated then I do not have an explanation for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants