Skip to content

Releases: gbouras13/pharokka

v1.4.1

04 Sep 01:12
885fef2
Compare
Choose a tag to compare

1.4.1 (2023-09-04)

Pharokka v1.4.1 is a small patch fix release fixing #286, where if you specified --dnaapler and -m, pharokka would not find the correct output file from dnaapler and would crash.

Thanks for spotting the bug @rdenise.

Full Changelog: v1.4.0...v1.4.1

v1.4.0

04 Sep 01:10
f00addd
Compare
Choose a tag to compare

Pharokka v1.4.0 is a large update implementing:

  • More sensitive search for PHROGs using Hidden Markov Models (HMMs) using the amazing PyHMMER. Thanks to @althonos for this amazingly written and well documented software.
  • By default, pharokka will now run searches using both MMseqs2 (PHROGs, CARD and VFDB) and HMMs (PHROGs). MMseqs2 was kept for PHROGs as it provides more information than the HMM results (e.g. sequence alignment identities & top hit PHROG protein) if it finds a hit.
  • --fast or --hmm_only parameter, which only runs PyHMMER on PHROGs. It will not run MMseqs2 at all on PHROGs, CARD or VFDB. For phage isolates, this will be much faster than v1.3.2, but you will not get CARD or VFDB annotations. For metagenomes, this will be (much) slower though!
  • Updated databases as of 23 August 2023. You will need to download the new pharokka v1.4.0 databases because these now contain PHROG HMM profiles. The VFDB database is now clustered at 50% sequence identity (which speeds up runtime).
  • Other changes in the codebase should make pharokka v1.4.0 run somewhat faster than v1.3.2, even if PyHMMER is not used i.e. --mmseqs2_only is specified.
  • The print screen and log files are neater and more information rich using loguru. There is also a new logs directory containing separate log files for each tool in the pipeline. This is thanks to taking and modifying some code from @mbhall88 tbpore.
  • install_databases.py has been modified to be more robust and somewhat faster. This is thanks to taking ideas and modifying some code from @oschwengers bakta.
  • --mmseqs2_only which will essentially run pharokka as it was v1.3.2. It is default in meta mode -m or --meta.
  • pharokka_proteins.py, which takes an input file of amino acid proteins in FASTA format and runs MMseqs2 (PHROGs, CARD, VFDB) and PyHMMER (PHROGs). See the proteins documentation for more details. Thanks to Brady Cress for the idea.
  • --custom_hmm parameter, which allows for custom HMM profile databases to be used with pharokka. Thanks to @pck00 for the idea.
  • create_custom_hmm.py which facilitates the creation of a HMM profile database from multiple sequence alignments. See the documentation for more details about how to create a compatible HMM profile database.
  • --dnaapler flag, which automatically detects and reorients your phage to start with the large terminase subunit. For more information, see dnaapler.
  • --genbank flag, which allows for genbank format input with -i. This will take all (customised) CDS calls in genbank file and PHANOTATE/pyrodigal will not be run. So if you have done manual custom gene curation and want to functionally annotate your customised CDS, this option is recommended. Thanks to @pck00 for the idea.
  • Fixes to -c, which should now work properly with -g prodigal (thanks @alegione for the fixes).

v1.3.2

27 Apr 02:35
7a2c44e
Compare
Choose a tag to compare

Minor bug fixes release

  • Fixes bug with pharokka_plotter.py, which would crash if the phage had tmRMAs or CRISPRs.
  • Fixes bug where integration & excision fwd strand CDS would not be plotted in the correct colour
  • Adds tmRNAs and CRISPRs to pharokka_plotter.py.

v1.3.1

21 Apr 05:20
3f155cd
Compare
Choose a tag to compare
  • Adds tRNAs to pharokka_plotter.py.
  • Adds the -s split mode option with metagenome mode, this will output separate single fastas, gff and genbank files along with -m. It is ideally used for situations where you have bulk phage isolates you want to annotate in one go.

v1.3.0

11 Apr 14:45
c822b4c
Compare
Choose a tag to compare
  • Adds pharokka_plotter.py to create plots with pyCirclize.
  • Fixes issue #243 with VFDB and CARD counts in _cds_functions.tsv being 0 even is a virulence factor or AMR gene is detected. Thanks @sxh1136.
  • Adds better error checking for --threads.
  • Adds some other error checking.

v 1.2.1

20 Feb 12:30
3c449e3
Compare
Choose a tag to compare
  • Minor update to fix Biopython version <=1.80, due to a breaking change with 1.81 affecting the other dependency bcbio-gff seen in this issue. Thanks @magbphp and Rick Meinersmann for pointing this out.

v1.2.0

24 Jan 06:29
a58683a
Compare
Choose a tag to compare
  • Adds the functionality of matching each contig against the INPHARED database using mash (https://github.com/RyanCook94/inphared). The top hit for each contig (under a maximum mash distance threshold of 0.2) is kept.
  • New database adding INPHARED. You will need to re-download the pharokka database to use v1.2.0.
  • Replaced prodigal with pyrodigal as it is being actively maintained and used by bakta.
  • Adds --citation.
  • Adds checks for dependencies.
  • Adds --terminase terminase mode to re-orient a single contig phage to begin with a certain orientation and coordinate (most commonly, the large terminase subunit). With this, you must also specify --terminase_strand the strand of the terL gene and --terminase_start the start coordinate.
  • All locus tags end with 4 digits (trailing zeros) in order to play nice with vConTACT2 and start with 1 not 0.
  • In meta mode, the locus tags now begin with the contig header, not a random string (or chosen prefix).
  • Cleans up the .tbl so it should automatically be accepted by NCBI Bankit.

v1.1.0

20 Oct 04:40
cc76f29
Compare
Choose a tag to compare
  • Passes multithreading to PHANOTATE and tRNAscan-SE in meta mode indicated by flag -m, which provides approximately a t-fold improvement in run-time for large metavirome datasets, where t is the number of threads.
  • Renames the CDS output files to *.faa for amino acids and *.ffn for nulceotide sequences to be standards compliant.
  • Implementation of consistent CDS name (equal to the locus_tag) across all output files. *faa and *ffn files also include annotations of function in the FASTA header.
  • Creates terL.faa and terL.ffn, which contain the sequences of any identified terminase large subunit CDS.

v1.0.1

09 Oct 23:10
23208fe
Compare
Choose a tag to compare

Minor release to fix a bug where pharokka v1.0.0 would crash when certain VFDB virulence factors were detected.

No update to databases required.

v1.0.0

15 Sep 13:28
0fe0540
Compare
Choose a tag to compare
  • Overhaul of install_databases.py and structure of pharokka's databases to avoid issues with installation.
  • Adds pre-built Pharokka Database available at https://zenodo.org/record/7081772.
  • Removes errors (mostly due to string parsing) to improve robustness.
  • Codebase cleaner and more consistent to follow.
  • VFDB and CARD databases added into pharokka's databases automatically