Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long processing time during DRAM annotate, step "Getting forward best hits from kegg" #390

Open
ganiatgithub opened this issue Nov 21, 2024 · 0 comments

Comments

@ganiatgithub
Copy link

Hi developers,

I was wondering whether you can share some insights on what I've noticed in running DRAM annotate. Occasionally, the "Getting forward best hits from kegg" takes days to finish. Here I attach a log file showing it has taken 5 days.

2024-11-12 09:41:11,811 - 10 FASTAs found
2024-11-12 09:41:11,945 - Starting the Annotation of Bins with database configuration: 
 

KEGG db:
    Description_Db_Updated: 05/13/2023, 12:52:43
    Citation:  M. Kanehisa, M. Furumichi, Y. Sato, M. Ishiguro-Watanabe, and M. Tanabe, "Kegg: integrating viruses and cellular organisms," Nucleic acids research, vol. 49, no. D1, pp. D545–D551, 2021.
KOfam db:
...
...
... # removed text for simplicity
Genome summary form:
    Branch: master
    Download Time: 05/13/2023, 11:40:10
    Origin: Downloaded by DRAM
Module step form:
    Branch: master
    Download Time: 05/13/2023, 11:40:11
    Origin: Downloaded by DRAM
ETC module database:
    Branch: master
    Download Time: 05/13/2023, 11:40:12
    Origin: Downloaded by DRAM
Function heatmap form:
    Branch: master
    Download Time: 05/13/2023, 11:40:11
    Origin: Downloaded by DRAM
AMG database:
    Branch: master
    Download Time: 05/13/2023, 11:40:11
    Origin: Downloaded by DRAM
2024-11-12 09:41:11,946 - Retrieved database locations and descriptions
2024-11-12 09:41:11,947 - Annotating 11_GN_calf_175_refined_bin.225
2024-11-12 09:41:19,741 - Turning genes from prodigal to mmseqs2 db
2024-11-12 09:41:21,690 - Getting forward best hits from kegg
2024-11-17 12:45:10,947 - Getting reverse best hits from kegg
2024-11-17 12:45:44,746 - Getting descriptions of hits from kegg
2024-11-17 12:47:14,421 - Getting forward best hits from peptidase
2024-11-17 12:47:23,164 - Getting reverse best hits from peptidase
2024-11-17 12:47:25,652 - Getting descriptions of hits from peptidase
2024-11-17 12:47:30,950 - Getting hits from pfam
2024-11-17 12:47:42,104 - Getting hits from dbCAN
2024-11-17 12:47:49,922 - Merging ORF annotations
2024-11-17 12:48:10,139 - Annotating 11_GN_calf_148_refined_bin.201
2024-11-17 12:48:12,825 - Turning genes from prodigal to mmseqs2 db
2024-11-17 12:48:14,402 - Getting forward best hits from kegg
2024-11-17 13:25:12,910 - Getting reverse best hits from kegg
2024-11-17 13:25:52,678 - Getting descriptions of hits from kegg
2024-11-17 13:27:08,258 - Getting forward best hits from peptidase
2024-11-17 13:27:13,865 - Getting reverse best hits from peptidase
2024-11-17 13:27:15,080 - Getting descriptions of hits from peptidase
2024-11-17 13:27:15,568 - Getting hits from pfam
2024-11-17 13:27:24,055 - Getting hits from dbCAN
2024-11-17 13:27:27,325 - Merging ORF annotations
2024-11-17 13:27:32,511 - No tRNAs were detected, no trnas.tsv file will be created.
2024-11-17 13:27:33,868 - Annotating 00_GN_BF177_ref_50_10_bin.54
 ...# removed other genome annotation log

For the other jobs I ran for the another subset of the genomes, the analysis was fine:

2024-11-12 09:41:11,811 - The log file is created at /home/gnii0001/075_succession/result/DRAM/normal_queue_additional1/chunk_124/annotation/annotate.log.
2024-11-12 09:41:11,812 - 10 FASTAs found
2024-11-12 09:41:11,945 - Starting the Annotation of Bins with database configuration: 
 
...
...
... # removed text for simplicity
2024-11-12 09:41:11,947 - Retrieved database locations and descriptions
2024-11-12 09:41:11,947 - Annotating 11_GN_calf_155_refined_bin.146
2024-11-12 09:41:17,007 - Turning genes from prodigal to mmseqs2 db
2024-11-12 09:41:18,697 - Getting forward best hits from kegg
2024-11-12 11:20:04,036 - Getting reverse best hits from kegg
2024-11-12 11:20:25,303 - Getting descriptions of hits from kegg
2024-11-12 11:21:09,581 - Getting forward best hits from peptidase
2024-11-12 11:21:18,976 - Getting reverse best hits from peptidase
2024-11-12 11:21:21,365 - Getting descriptions of hits from peptidase
2024-11-12 11:21:25,825 - Getting hits from pfam
2024-11-12 11:24:27,187 - Getting hits from dbCAN
2024-11-12 11:24:34,783 - Merging ORF annotations
2024-11-12 11:24:44,422 - Annotating 11_GN_calf_171_refined_bin.81
...
...
... # removed text for simplicity
2024-11-12 15:20:04,400 - Completed annotations

I'm using DRAM version 1.4.6, mamba installation, and here is my code:

DRAM.py annotate -i "$MAG_DIR/*" -o "$DRAM_OUT_DIR"/annotation --threads "$thread"
DRAM.py distill -i "$DRAM_OUT_DIR"/annotation/annotations.tsv -o "$DRAM_OUT_DIR"/genome_summaries --trna_path "$DRAM_OUT_DIR"/annotation/trnas.tsv --rrna_path "$DRAM_OUT_DIR"/annotation/rrnas.tsv

Any suggestions to look further is much appreciated.

Best regards,
Gaofeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant