Skip to content

Releases: pdimens/harpy

1.14.3

20 Dec 15:35
f3ab27c
Compare
Choose a tag to compare

Bugs fixed

  • return missing haplotag barcode script that went missing after squashing commits and broke demuxing

Changed

  • added rule priority for some workflows so they prioritize creating the output files over calculting metrics and writing reports
    • this means that, for example, align bwa will prioritize creating all the output bam files, rather than running a single sample through everything

Full Changelog: 1.14.2...1.14.3

1.14.2

13 Dec 15:32
Compare
Choose a tag to compare

Changed

This is a bugfix for #176 that has a better help string for downsample, which states the BX:Z tag has to be terminal for the record

Added

  • bx_to_end.py to help preprocess FASTQ/BAM files for downsample

1.14.1

11 Dec 20:11
2e8c1b0
Compare
Choose a tag to compare

Never too proud to admit I was wrong. I didnt wan't downsample to be a snakemake workflow, but with the increased complexity of what I wanted it to do, I found myself writing an increasingly complex python script that was essentially doing all the stuff Snakemake was doing. So:

New

  • Introduced a command-line utility for extracting barcodes from SAM/BAM files
  • Enhanced phasing statistics reporting with new metrics (N50, N75, N90)
  • LRez is now part of the main Harpy installation and accessible to the user
  • adapter removal in the qc module accepts an argument now, one of:
    • auto for automatic adapter detection
    • a FASTA file of adapters

Changed

  • Downsampling is now a snakemake workflow
  • downsample handles invalids in a much more intuitive (and sensible) way

Full Changelog: 1.14...1.14.1

1.14

06 Dec 21:42
e1867c2
Compare
Choose a tag to compare

New

  • added a convenience script separate_singletons to split a bam file into singletons and nonsingletons
  • harpy downsample module to downsample FASTQ/BAM by barcodes

Breaking changes

  • singletons are now calculated such that both reads of a paired-end read only counts as "one read" for a barcode
    • which means unpaired reads now contribute properly to this value
    • overall, this is a more accurate way of calculating this metric

Fixes

  • separate_validbx has a usage change, which is breaking, however this script is not used by any of the workflows so there should be no appreciable difference
  • alignment reports have text that clarifies which math is for non-singletons
  • multiplex reads (aka reads that arent linked-read singletons) are now just referred to as non-singletons

1.13

26 Nov 15:39
Compare
Choose a tag to compare

New Features

  • new view command to view workflow log, snakefile, or configuration file.
  • conda environment recipes are now stored in outdir/workflow/envs for more self-contained workflow directories
    • also improves workflow-specific troubleshooting

Breaking Changes

  • stitchparams has been renamed imputeparams

Internal

  • improved handling of conda environments across various commands, allowing for better configuration and dependency management.
  • Updated environment directory paths for better organization and clarity across all workflows
  • local simuG replaced with conda installation
    • Removed dependency on the simuG.pl script for several simulation workflows, streamlining the execution process
    • rename rules and better directory structure for simulate variants

Bug Fixes

  • Improved regular expression handling in file processing to enhance clarity and prevent issues.
  • Corrected typos in align_stats.Rmd and routines for handling no valid barcodes

Issues and PRs

Full Changelog: 1.12...1.13

1.12

19 Nov 16:56
b56bf1c
Compare
Choose a tag to compare

What's new (and important)

simulate linkedreads now supports and defaults to haplotagging barcodes

  • 84 million barcode options instead of 14m
  • support for barcodes of any length, not just the 10X 16bp
  • barcode sequencing error has been removed because you're ultimately interested in the linked read data, not the sequencing nuances

Internal

  • HaploSim.pl (formerly LRSIM_harpy.pl) focuses solely on creating linked reads from provided haplotypes
  • output names for simulate linkedreads more flexible now
  • leveraged parameters better in HaploSim.pl
  • Added haplotag_barcodes.py to auto-generate haplotag barcodes
  • inline to haplotagging conversion uses memory-efficient in-memory sqlite3 database
  • barcode validations for align ema and simulate linkedreads

Bugs fixed

  • [simulate linkedreads] barcode key generated as a fixed keymap, ensuring barcodes have same haplotag code between different haplotypes

What's Changed

Full Changelog: 1.11...1.12

1.11

08 Nov 21:46
ac9a13c
Compare
Choose a tag to compare

New Features

  • [sv leviathan] now also makes BX tags unique when concatenating population groups
    • provided as --bx option to concatenate_bam.py
  • new standalone script deconvolve_alignments.py that does the same thing as assign_mi.py, but also deconvolves the BX tag into hyphenated form

Fixes

  • R logic for properly parsing new --contigs option #160

Improvements

  • LOTS more guardrails with respect to validations and error handling
    • Simplified logic for file type validation and tag management in scripts
    • Enhanced error handling for missing input files across multiple scripts
    • Updated argument parser configurations for improved user guidance and error handling
    • Streamlined output methods across multiple scripts for consistency

PRs

Full Changelog: 1.10.1...1.11

1.10.1

05 Nov 15:57
31c905d
Compare
Choose a tag to compare

This release was a big internal refactor and didn't feel like enough visible changes were present to release it as 1.11, so it's named 1.10.1 instead

Internal

  • some of the simpler file validations moved to the command-line parsing part of harpy #159
  • [hpc] has much less redundant code

New Features

  • All workflows with an --extra-params option now have some program-specific argument validation #158
  • --snakemake now has validations
  • --hpc now has validations
  • [align ema] made read fragment density optimization off by default and is now exposed as a command-line argument to toggle on

Other changes

  • [hpc] now checks if the executor plugin is installed and only prints the notice if it isn't
  • [stitchparams] and [popgroup] have slightly nicer printing

Full Changelog: 1.10...1.10.1

1.10

30 Oct 20:04
Compare
Choose a tag to compare

New Features

  • assembly and metassembly workflows
  • --contigs option for align ... sv ... and phase workflows
  • calculations for molecular coverage too
  • non-singleton metrics added to alignment reports

Internal

  • remove pandas dependency b/c no longer using Paramspace()
    • impute parameter file now gets transcribed into the config.yaml
  • new and better validations
  • some validations have progressbar and are parallelized
    • progressbar respects --quiet
  • config.yaml files are written using the yaml stdlib for consistency
  • workflow summaries have more robust logic

Breaking changes

  • parameter file for impute has a new name column that will name relevant outputs for a given parameter set
    • this affects the output directories now, which are named according to this name value
  • config.yaml file restructured a bit, mainly some params have better snake-cased names
    • skipreports is now skip under a new reports hierarchy

Pull Requests

Full Changelog: 1.9...1.10

1.9

04 Oct 16:38
Compare
Choose a tag to compare
1.9

Breaking changes

  • --prefix for variant simulations is now just sim to reduce redundancy of defaults
  • --heterozygosity for variant simulations now outputs a diploid genome by default
    • previous behavior can be achieved with --only-vcf option
    • see tutorial here