You will need to re-install the updated phold database for v0.2.0 using phold install
You will also need to upgrade Foldseek to v9.427df8a
v0.2.0 is a very large update adding:
- Improved sensitivity and faster runtime for the
foldseek
search. This is achieved by clustering the Phold database at--min-seq-id 0.3 -c 0.8
and creating a cluster db before running withfoldseek
which significantly improves runtime- Overall, just over 1.1M structures are clustered into around 372k clusters
--cluster-search 1
parameter is added tofoldseek search
to search against the cluster representatives first and then within each cluster, which increases sensitivity and reduces resource usage compared tophold v0.1.4
- Changed default
--max_seqs
from 1000 to 10000 to improve sensitivity at little resource usage cost - Phold database is expanded adding:
- Extremely conservative high confidence efam proteins with hits to PHROGs.
- 95% dereplicated diversity-generating retroelements (DGRs) from Roux et al.
- 7153 netflax toxin-antitoxin system proteins from Ernits et al.
- Adds
--ultra_sensitive
flag which turns off Foldseek prefiltering for maximum sensitivity. Recommended for small datasets/single phages only.- This passes the
--exhaustive-search
parameter tofoldseek search
- This passes the
- Adds the ability to save ProstT5 embeddings with
--save_per_residue_embeddings
and--save_per_protein_embeddings
- Adds
.cif
support (e.g. from Alphafold3 server) for structures, not just.pdb
file format and changing the CLI to reflect this - Removes some experimental parameters from v0.1.4 (
--split
etc)
Breaking CLI parameter changes
--pdb
has changed to--structures
--pdb_dir
has changed to--structure_dir
--filter_pdbs
has changed to--filter_structures