Releases: algbio/themisto
Themisto-v2.0.0 (19 November 2021)
This is a major release that breaks index compatibility and introduces changes to the interface.
- All functions of the software have been unified under a single binary file
themisto
. - Index construction is now done with the command
themisto build
.- The old flag
--auto-colors
is now default behaviour. - The flag
--no-colors
builds the graph without the colors. - The flag
--index-dir
is now--index-prefix
. The index now consists of only two files:[prefix].tdbg
and[prefix].tcolors
. - K-mers containing characters outside of the nucleotide alphabet are now deleted instead of being replaced with random nucleotides. The old behavior is accessible with the flag
--randomize-non-ACGT
.
- The old flag
- Pseudoalignment is now done with the command
themisto pseudoalign
.- The flag
--index-dir
is now--index-prefix
.
- The flag
- New functionality: the command
themisto extract-unitigs
dumps the maximal unitigs and their colors to disk. - New functionality: the command
themisto stats
reports some statistics on the nodes, edges and colors of the graph. - The user must now give the colors as integers. This eliminates the layer of indirection that maps the color names to integers. This simplifies the intepretation of the pseudoalignment output.
- FASTA and FASTQ parsing is now properly buffered, which speeds up the code in many places.
- The units tests have been expanded and moved to the GoogleTest framework.
- More informative error messages in many places.
- Multithreading has been enabled in the initial sorting of the k-mers.
- The repository now contains small example input files to test the tools quickly, and a quick start guide.
Themisto-v1.2.0 (19 October 2021)
- Fixed a bug in construction which caused Themisto to always sort everything in memory instead of disk (bug introduced in release 1.0.0)
- Added command line tools to dump all unitigs and all color information to disk
- Fixed two bugs that caused program crashes
Themisto-v1.1.0 (2 July 2021)
Fixed two bugs.
- KMC database was not deleted after use
- Pseudoalignment crashed on queries shorter than k
Themisto-v1.0.0 (1 July 2021)
This is a major update to the software.
-
The index format has been changed, which breaks compatibility with previous versions. The update addresses a space efficiency problem in indexing read sets (#6).
-
The behaviour of the pseudoalignment with respect to reverse complements has now been improved. Previously, the program computed the intersection of non-empty color sets for both the input sequence and its reverse complement separately, and reported the union of the two. Now, the program merges the color sets of a k-mer with the color set of its reverse complement during pseudoalignment, and takes the intersection of the merged color sets. We find this is a more principled approach to handling reverse complements. The new method gave slightly better results in our metagenomics pipeline.
Smaller changes in this release:
- Some inconsistencies in the command line interface have been fixed.
- --outfile is now called --out-file, to match the hyphenation convention of other options
- --k is now -k to match the unix convention that single-character options always start with a single dash. There is now also a long-form option called --node-length for k
- The old formats are still recognized to avoid breaking old scripts, but this is undocumented.
- The maximum allowed value for k is now specified at compile time by passing
-DMAX_KMER_LENGTH=k
to cmake, which allows for more efficient index construction. - The build process has also been simplified by eliminating the dependency to BD_BWT_index.
- There is now a command-line option --colorset-pointer-tradeoff which can be used to control a time-space trade-off in the index
- Error reporting in command line parsing has been improved by using the cxxopts library
- The index directory or the temporary directory are now created if they do not exist. This is done using the
header of the C++17 standard library, so C++17-compliant compiler is now required to build the software. - The program now replaces non-ACGT characters more carefully. For example, the character R, which encodes purine, is
now replaced randomly with either A or G. - The readme of the repository now explains all command line parameters
Bug fixes:
- A bug in parsing fastq files for index construction introduced in 872fce0 has been fixed
- Commit 872fce0 also introduced an issue where we save a temporary file named
output_file
in the current directory instead of the
designated temp-directory, and the program did not delete that file afterwards. This has been fixed. - The file extension .fa for fasta files is now recognized
Themisto-v0.2.0 (2 April 2020)
Issue solving.
- Print pseudoalignments in unsorted format by default.
- Add the option '--sort-output' to sort the pseudoalignments before printing.
- Use the zstr library to handle read and write operations for files compressed with zlib.
- Add the option '--gzip-output' to compress the pseudoalignments with zlib.
Themisto-v0.1.1 (20 January 2020)
Bug hunting and sanifying the build process.
- Read files compressed with zlib are now properly supported.
- Changes to the build process w.r.t. Debug vs Release modes.
- build_index and pseudoalign now use the same syntax for multithreading.
themisto-v0.1.0 (5 December 2019)
First release of Themisto with basic functionality; including
- Pseudoalignment on pangenomes.
- Support for input files in .gz format.