Skip to content

Releases: pdimens/harpy

harpy v0.8.0

21 Mar 20:15
Compare
Choose a tag to compare

Changes

Under the hood

  • snakemake updated to v8.x
  • all snakemake code updated for v8
  • conda python env is now 3.12 (snakemake dependency)
  • pysam removed from conda recipe, now exists in align runtime environment
  • conditional rules in snakefiles are cleaner and more idiommatic

Usage

  • option --print-only still exists, but is now hidden since it's only really used in development and kinda rarely at that
  • BIG FEATURE new --output-dir (-o) option for all main modules to specify an output directory
    • BREAKING CHANGE the output directories for all the modules are identical to the previous versions except
      • harpy snp defaults to SNP/method output directory
      • harpy sv defaults to SV/method output directory

Bugs fixed

  • Some help string language improvements
  • mpileup with --populations now correctly uses the population mode (it accidentally ignored it)

Full Changelog: 0.7.0...0.8.0

harpy v0.7.3

01 Mar 16:41
Compare
Choose a tag to compare

add missing imports in validations.py
Full Changelog: 0.7.2...0.7.3

harpy v0.7.2

01 Mar 15:52
Compare
Choose a tag to compare

Full Changelog: 0.7.1...0.7.2

harpy v0.7.1

01 Mar 15:49
Compare
Choose a tag to compare

Full Changelog: 0.7.0...0.7.1

harpy v0.7.0

29 Feb 16:04
291acb7
Compare
Choose a tag to compare

What's Changed

The --directory (-d) argument has been removed in favor of flexible inputs at the end of the command line call. Now, you can include an unlimited number of input files/folders:

harpy align bwa -t 15 -g genome.fasta files1/*.fastq files2 files3/pop1*.gz

Full Changelog: 0.6.1...0.7.0

harpy v0.6.1

19 Feb 15:59
c29a78b
Compare
Choose a tag to compare

What's Changed

  • Remove --barcode-tag from EMA markduplicates workflow by @pdimens in #40

Full Changelog: 0.6.0...0.6.1

harpy v0.6.0

16 Feb 20:26
3a5b46f
Compare
Choose a tag to compare

Changed behavior

Instead of using sambamba markdups, harpy now uses samtools markdup -S --barcode-tag BX, which does two great things:

  1. uses linked-read barcode information in assessing PCR duplicates
  2. marks supplementary (chimeric) reads as duplicates if the primary alignment is marked as a duplicate

What's Changed

Full Changelog: 0.5.0...0.6.0

0.5.0

08 Feb 16:26
35af509
Compare
Choose a tag to compare

v0.5.0

New features

  • New common command line options: --print-only, -r and --skipreports
  • new option for impute module: --extra
  • onstart, onsuccess, and onerror messages
  • 10x support for ema
    • new option for align ema: --platform
  • workflow folder containing crucial workflow files, such as summary, the snakefile of the workflow, Rmd files, and a config.yaml

Improvements

  • wildcard constraints for certain modules to avoid recursion issues
    • align module now outputs bam files directly into the Align/xxx folder
  • harpy install now much more barebones and uses isolated conda environments at runtime
  • workflow summary now includes the invoked snakemake command
  • config for workflow moved to config file instead of command line <- better reproducibility

Bugfixes

  • align ema has correct directory creation for the ema count step
  • wildcard constraints reduces extra headaches downstream
  • reports for sv now only plot biggest 30 contigs with things present
  • skipping reports avoids errors when no variants detected while other fixes are being worked on to make reports more robust
  • snakemake extras (-s) now at the end of the snakemake invocation, which avoids certain errors

Breaking changes

  • align ema no longer has a --molecule-distance option

Full Changelog: 0.4.0...0.5.0

v0.4.0

27 Oct 15:41
1c997a3
Compare
Choose a tag to compare

New features

  • naibr variant calling now relies on whatshap to back-phase BAM files from a phased VCF for much, much better variant calling
  • harpy preflight to do a full suite of file format validations on bam/fastq files prior to running one of the main modules
  • lots more file-check validations and helpful error messages before harpy hands things over to snakemake
  • regular RMarkdown reports have been completely revised using R::Flexdashboard

bugfixes

  • More file checks means fewer downstream errors
  • typos corrected

paint and polish

  • error messages now use rich (which was always available) to provide really nice and organized output
  • rmarkdown html now uses flexdashboard

breaking changes

TL;DR: This release pretty much breaks every aspect of the the previous APIs. Pretend you've never used Harpy before

  • no more --method flag, instead methods have been submodule-arized into
    • harpy align --method bwa ... is now harpy align bwa ...
    • harpy align --method ema... is now harpy align ema ....
    • harpy variants snp --method... is now harpy snp method .... (method = freebayes or mpileup)
    • harpy variants sv --method...is nowharpy sv method ....(method =leviathanornaibr`)
  • harpy extra has been made into submodules too, now making:
    • harpy popgroup to create the popgroup file
    • harpy stitchparams to create the STITCH parameter file
    • harpy hpc to create a SLURM hpc profile

v0.3.0

24 Aug 19:02
c333c85
Compare
Choose a tag to compare

New features

  • freebayes variant calling
  • naibr variant calling
  • bam filename/RG tag validations
  • unit testing (not relevant for end-user)
  • --vcf-samples options for impute and phase
  • file/vcf sample validations
  • STITCH parameter file validations
  • genome assembly bgzipping validation/handling
  • much more flexible regex for fastq filenames
  • STITCH param file column order no longer matters and is not case-sensitive
  • new demultiplex module for Generation I (Meier et al., 2022) haplotag beads

Breaking changes

  • snp and sv variant calling now split into separate submodules under variants
    • variants --method is now variants snp --method and variants sv --method
  • STITCH param file has extra column bxlimit
  • the validations listed above will force Harpy to terminate if errors are detected
  • folder structure for output of align slightly different

Non-breaking changes

  • added Snakemake --nolock and --rerun-incomplete flags to all workflows
  • all workflows now provide appropriate exit codes
  • snp variant calling now occurs by genomic interval rather than by contig (a lot more parallel, a lot faster)
  • EMA aligns all preproc files at once into single alignment file, rather than aligning each individually and merging after
  • samtools stats/flagstat reports combined into single report
  • phase reports reworked/rewritten