Foldseek clustering stuck for days with errors #390

YFeriel · 2024-12-02T15:40:47Z

Hello,

I am using a protein structure database with Foldseek for clustering, but I encountered an issue with the following command:
foldseek cluster /data/foldseek/concat_db /data/db_clusters /data/cluster_tmp_dir

I received this error:
structurerescorediagonal /data/foldseek/concat_db /data/foldseek/concat_db /data/cluster_tmp_dir/4804289747088079168/pref /data/cluster_tmp_dir/4804289747088079168/pref_rescore1 --exact-tmscore 0 --tmsc>[=
Can not write to data file p
Can not write to data file /data/cluster_tmp_dir/4804289747088079168/pref_rescore1.29
...
Error: Rescore with hamming distance step died

Initially, I suspected it could be due to disk space or memory issues, so I added the --remove-tmp-files 1 option. While this resolved potential disk space concerns, the runtime increased significantly. The clustering process has now been running for over seven days and remains stuck at the same step:

structurerescorediagonal /data/foldseek/concat_db /data/foldseek/concat_db /data/cluster_tmp_dir/4804289747088079168/pref /data/cluster_tmp_dir/4804289747088079168/pref_rescore1 --exact-tmscore 0 --tmsc>[=

Could this error be related to disk space or memory limitations, or might it indicate a different issue? Also, are there any optimizations or alternative approaches you would recommend to reduce the runtime and avoid prolonged processing times like this?

Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foldseek clustering stuck for days with errors #390

Foldseek clustering stuck for days with errors #390

YFeriel commented Dec 2, 2024

Foldseek clustering stuck for days with errors #390

Foldseek clustering stuck for days with errors #390

Comments

YFeriel commented Dec 2, 2024