Releases: pdimens/harpy
Releases · pdimens/harpy
harpy v0.8.0
Changes
Under the hood
- snakemake updated to v8.x
- all snakemake code updated for v8
- conda python env is now 3.12 (snakemake dependency)
- pysam removed from conda recipe, now exists in
align
runtime environment - conditional rules in snakefiles are cleaner and more idiommatic
Usage
- option
--print-only
still exists, but is now hidden since it's only really used in development and kinda rarely at that - BIG FEATURE new
--output-dir
(-o
) option for all main modules to specify an output directory- BREAKING CHANGE the output directories for all the modules are identical to the previous versions except
harpy snp
defaults toSNP/method
output directoryharpy sv
defaults toSV/method
output directory
- BREAKING CHANGE the output directories for all the modules are identical to the previous versions except
Bugs fixed
- Some help string language improvements
- mpileup with
--populations
now correctly uses the population mode (it accidentally ignored it)
Full Changelog: 0.7.0...0.8.0
harpy v0.7.3
add missing imports in validations.py
Full Changelog: 0.7.2...0.7.3
harpy v0.7.2
Full Changelog: 0.7.1...0.7.2
harpy v0.7.1
Full Changelog: 0.7.0...0.7.1
harpy v0.7.0
What's Changed
The --directory
(-d
) argument has been removed in favor of flexible inputs at the end of the command line call. Now, you can include an unlimited number of input files/folders:
harpy align bwa -t 15 -g genome.fasta files1/*.fastq files2 files3/pop1*.gz
Full Changelog: 0.6.1...0.7.0
harpy v0.6.1
What's Changed
Full Changelog: 0.6.0...0.6.1
harpy v0.6.0
Changed behavior
Instead of using sambamba markdups
, harpy now uses samtools markdup -S --barcode-tag BX
, which does two great things:
- uses linked-read barcode information in assessing PCR duplicates
- marks supplementary (chimeric) reads as duplicates if the primary alignment is marked as a duplicate
What's Changed
Full Changelog: 0.5.0...0.6.0
0.5.0
v0.5.0
New features
- New common command line options:
--print-only
,-r
and--skipreports
- new option for
impute
module:--extra
- onstart, onsuccess, and onerror messages
- 10x support for ema
- new option for
align ema
:--platform
- new option for
workflow
folder containing crucial workflow files, such as summary, the snakefile of the workflow, Rmd files, and a config.yaml
Improvements
- wildcard constraints for certain modules to avoid recursion issues
align
module now outputs bam files directly into theAlign/xxx
folder
- harpy install now much more barebones and uses isolated conda environments at runtime
- workflow summary now includes the invoked snakemake command
- config for workflow moved to config file instead of command line <- better reproducibility
Bugfixes
align ema
has correct directory creation for theema count
step- wildcard constraints reduces extra headaches downstream
- reports for sv now only plot biggest 30 contigs with things present
- skipping reports avoids errors when no variants detected while other fixes are being worked on to make reports more robust
- snakemake extras (
-s
) now at the end of the snakemake invocation, which avoids certain errors
Breaking changes
align ema
no longer has a--molecule-distance
option
Full Changelog: 0.4.0...0.5.0
v0.4.0
New features
- naibr variant calling now relies on
whatshap
to back-phase BAM files from a phased VCF for much, much better variant calling harpy preflight
to do a full suite of file format validations on bam/fastq files prior to running one of the main modules- lots more file-check validations and helpful error messages before harpy hands things over to snakemake
- regular RMarkdown reports have been completely revised using R::Flexdashboard
bugfixes
- More file checks means fewer downstream errors
- typos corrected
paint and polish
- error messages now use
rich
(which was always available) to provide really nice and organized output - rmarkdown html now uses flexdashboard
breaking changes
TL;DR: This release pretty much breaks every aspect of the the previous APIs. Pretend you've never used Harpy before
- no more
--method
flag, instead methods have been submodule-arized intoharpy align --method bwa ...
is nowharpy align bwa ...
harpy align --method ema...
is nowharpy align ema ....
harpy variants snp --method...
is nowharpy snp method ....
(method =freebayes
ormpileup
)- harpy variants sv --method...
is now
harpy sv method ....(method =
leviathanor
naibr`)
harpy extra
has been made into submodules too, now making:harpy popgroup
to create the popgroup fileharpy stitchparams
to create the STITCH parameter fileharpy hpc
to create a SLURM hpc profile
v0.3.0
New features
- freebayes variant calling
- naibr variant calling
- bam filename/RG tag validations
- unit testing (not relevant for end-user)
--vcf-samples
options forimpute
andphase
- file/vcf sample validations
- STITCH parameter file validations
- genome assembly bgzipping validation/handling
- much more flexible regex for fastq filenames
- STITCH param file column order no longer matters and is not case-sensitive
- new
demultiplex
module for Generation I (Meier et al., 2022) haplotag beads
Breaking changes
- snp and sv variant calling now split into separate submodules under
variants
variants --method
is nowvariants snp --method
andvariants sv --method
- STITCH param file has extra column
bxlimit
- the validations listed above will force Harpy to terminate if errors are detected
- folder structure for output of
align
slightly different
Non-breaking changes
- added Snakemake
--nolock
and--rerun-incomplete
flags to all workflows - all workflows now provide appropriate exit codes
- snp variant calling now occurs by genomic interval rather than by contig (a lot more parallel, a lot faster)
- EMA aligns all preproc files at once into single alignment file, rather than aligning each individually and merging after
- samtools stats/flagstat reports combined into single report
phase
reports reworked/rewritten