Skip to content

Releases: algbio/themisto

Themisto-v2.0.0 (19 November 2021)

19 Nov 18:43
Compare
Choose a tag to compare

This is a major release that breaks index compatibility and introduces changes to the interface.

  • All functions of the software have been unified under a single binary file themisto.
  • Index construction is now done with the command themisto build.
    • The old flag --auto-colors is now default behaviour.
    • The flag --no-colors builds the graph without the colors.
    • The flag --index-dir is now --index-prefix. The index now consists of only two files: [prefix].tdbg and [prefix].tcolors.
    • K-mers containing characters outside of the nucleotide alphabet are now deleted instead of being replaced with random nucleotides. The old behavior is accessible with the flag --randomize-non-ACGT.
  • Pseudoalignment is now done with the command themisto pseudoalign.
    • The flag --index-dir is now --index-prefix.
  • New functionality: the command themisto extract-unitigs dumps the maximal unitigs and their colors to disk.
  • New functionality: the command themisto stats reports some statistics on the nodes, edges and colors of the graph.
  • The user must now give the colors as integers. This eliminates the layer of indirection that maps the color names to integers. This simplifies the intepretation of the pseudoalignment output.
  • FASTA and FASTQ parsing is now properly buffered, which speeds up the code in many places.
  • The units tests have been expanded and moved to the GoogleTest framework.
  • More informative error messages in many places.
  • Multithreading has been enabled in the initial sorting of the k-mers.
  • The repository now contains small example input files to test the tools quickly, and a quick start guide.

Themisto-v1.2.0 (19 October 2021)

19 Oct 18:20
Compare
Choose a tag to compare
  • Fixed a bug in construction which caused Themisto to always sort everything in memory instead of disk (bug introduced in release 1.0.0)
  • Added command line tools to dump all unitigs and all color information to disk
  • Fixed two bugs that caused program crashes

Themisto-v1.1.0 (2 July 2021)

02 Jul 14:46
Compare
Choose a tag to compare

Fixed two bugs.

  • KMC database was not deleted after use
  • Pseudoalignment crashed on queries shorter than k

Themisto-v1.0.0 (1 July 2021)

01 Jul 09:53
Compare
Choose a tag to compare

This is a major update to the software.

  • The index format has been changed, which breaks compatibility with previous versions. The update addresses a space efficiency problem in indexing read sets (#6).

  • The behaviour of the pseudoalignment with respect to reverse complements has now been improved. Previously, the program computed the intersection of non-empty color sets for both the input sequence and its reverse complement separately, and reported the union of the two. Now, the program merges the color sets of a k-mer with the color set of its reverse complement during pseudoalignment, and takes the intersection of the merged color sets. We find this is a more principled approach to handling reverse complements. The new method gave slightly better results in our metagenomics pipeline.

Smaller changes in this release:

  • Some inconsistencies in the command line interface have been fixed.
    • --outfile is now called --out-file, to match the hyphenation convention of other options
    • --k is now -k to match the unix convention that single-character options always start with a single dash. There is now also a long-form option called --node-length for k
    • The old formats are still recognized to avoid breaking old scripts, but this is undocumented.
  • The maximum allowed value for k is now specified at compile time by passing -DMAX_KMER_LENGTH=k to cmake, which allows for more efficient index construction.
  • The build process has also been simplified by eliminating the dependency to BD_BWT_index.
  • There is now a command-line option --colorset-pointer-tradeoff which can be used to control a time-space trade-off in the index
  • Error reporting in command line parsing has been improved by using the cxxopts library
  • The index directory or the temporary directory are now created if they do not exist. This is done using the
    header of the C++17 standard library, so C++17-compliant compiler is now required to build the software.
  • The program now replaces non-ACGT characters more carefully. For example, the character R, which encodes purine, is
    now replaced randomly with either A or G.
  • The readme of the repository now explains all command line parameters

Bug fixes:

  • A bug in parsing fastq files for index construction introduced in 872fce0 has been fixed
  • Commit 872fce0 also introduced an issue where we save a temporary file named output_file in the current directory instead of the
    designated temp-directory, and the program did not delete that file afterwards. This has been fixed.
  • The file extension .fa for fasta files is now recognized

Themisto-v0.2.0 (2 April 2020)

02 Apr 15:12
Compare
Choose a tag to compare

Issue solving.

  • Print pseudoalignments in unsorted format by default.
  • Add the option '--sort-output' to sort the pseudoalignments before printing.
  • Use the zstr library to handle read and write operations for files compressed with zlib.
  • Add the option '--gzip-output' to compress the pseudoalignments with zlib.

Themisto-v0.1.1 (20 January 2020)

02 Apr 15:00
Compare
Choose a tag to compare

Bug hunting and sanifying the build process.

  • Read files compressed with zlib are now properly supported.
  • Changes to the build process w.r.t. Debug vs Release modes.
  • build_index and pseudoalign now use the same syntax for multithreading.

themisto-v0.1.0 (5 December 2019)

02 Apr 14:59
Compare
Choose a tag to compare

First release of Themisto with basic functionality; including