Hits annotation #214

YiJessePi · 2023-05-24T06:31:23Z

YiJessePi
May 24, 2023

Hello,
I miss something regarding the annotation process once a hit was found and hope you can help me close this gap.
I understand that the hits are then processed by the "Expert annotation systems" but I would love to better understand the process.
As an example, I tried to annotate a genome and this is how the annotation of one of the proteins looked like:
Transposase SO:0001217, UniRef:UniRef50_A0A7M1RUZ9, UniRef:UniRef90_A0A7M1RUZ9
Looking at A0A7M1RUZ9 in uniprot pointed on hypothetical protein, so how actually the protein was annotated as a transposase?

Thank you so much!
Bakta is really fast, provides a lot of information and very helpful!

Answered by oschwengers

May 25, 2023

Hi @YiJessePi , thanks for reaching out and asking.
Yes, the annotation workflow of Bakta has become quite complex and far from trivial as there are several steps incorporating different sequence, HMM resources and annotation data. The entire workflow is described here: https://github.com/oschwengers/bakta#coding-sequences

After this workflow, Bakta has tons of information which is then utilized for the final annotation. In this process, the so called "expert annotation systems" have the highest rank (internally they comprise different sources which distinct ranks each). So if there is a hit from an expert system, then this annotation data is preferred.

If this is not the case, then Bakta…

View full answer

oschwengers · 2023-05-25T07:42:10Z

oschwengers
May 25, 2023
Maintainer

Hi @YiJessePi , thanks for reaching out and asking.
Yes, the annotation workflow of Bakta has become quite complex and far from trivial as there are several steps incorporating different sequence, HMM resources and annotation data. The entire workflow is described here: https://github.com/oschwengers/bakta#coding-sequences

After this workflow, Bakta has tons of information which is then utilized for the final annotation. In this process, the so called "expert annotation systems" have the highest rank (internally they comprise different sources which distinct ranks each). So if there is a hit from an expert system, then this annotation data is preferred.

If this is not the case, then Bakta utilizes information from its own internal database. This DB comprises highly-integrated and merged information from a vast number of high-quality annotation sources (https://github.com/oschwengers/bakta#database). During the compilation of this DB, annotation information for each unique protein sequence and sequence cluster is superseded several times applying the most specific annotation information at last, e,g, transposases from ISfinder and NCBI. Therefore, annotations from Bakta often differ from the information UniProt provides. However, we nevertheless annotate such sequences with dbxrefs to UniProt and other DBs to provide extra information, and UniProt has a lot of it.

I hope this clarifies it a bit. Just in case, please do not hesitate to keep asking.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hits annotation #214

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Hits annotation #214

YiJessePi May 24, 2023

Replies: 1 comment

oschwengers May 25, 2023 Maintainer

YiJessePi
May 24, 2023

oschwengers
May 25, 2023
Maintainer