Releases: torognes/vsearch
VSEARCH 2.29.1
VSEARCH 2.29.0
vsearch 2.29.0 fixes seven bugs (see changelog below), adds initial support for RISC-V architectures, and improves code quality and code testing (1,210 new tests).
Changelog:
- add: experimental support for RISCV64 and other 64-bit little-endian architectures, thanks to Michael R. Crusoe and his fellow Debian developers (PR #566),
- add: official support for clang-19 and gcc 14,
- add: beta support for clang-20,
- remove: unused
--output
option for command--fastq_stats
(issue #572), - fix: bug in
--sintax
when selecting the best lineage (only low confidence values below 0.5 were affected) (issue #573), - fix: out-of-bounds error in
--fastq_stats
when processing empty reads (issue #571), - fix: bug in
--cut
, patterns with multiple cutting sites were not detected (commit 4c4f9fa), - fix: memory error (segmentation fault) when using
--derep_id
and--strand
(issue #565), - fix:
--fastq_join
now obeys to--quiet
and--log
options (commit 87f968b), - fix:
--fastq_join
quality padding is now also set to Q40 when quality offset is 64 (commit be0bf9b), - fix: (partial)
--fastq_join
's handling of abundance annotations (commit f2bbcb4), - improve: additional safeguards to validate input values and to make sure that they are within acceptable limits. Changes concern options
--abskew
(commit a530dd8) and--fastq_maxdiffs
(commit 4b254db), - improve: code quality (1.3k+ commits, 6k+ clang-tidy warnings eliminated),
- improve: documentation and help messages (issue #568),
- improve: complete refactoring and modernization of a subset of commands (
--sortbylength
,--sortbysize
,--shuffle
,--rereplicate
,--cut
,--fastq_join
,--fasta2fastq
,--fastq_chars
), - improve: code-coverage of our test-suite for the above-mentioned commands (1,210 new tests, 4,753 in total).
VSEARCH 2.28.1
The sintax
command has been improved in several ways in this version of vsearch. Please note that several details of this algorithm is not clearly described in the preprint, and the implementation in vsearch differs from that in usearch.
The former vsearch version did not always choose the most common taxonomic entity over the 100 bootstraps among the database sequences with the highest amount of word similarity to the query. Instead, if several sequences had an equal similarity with the query, the sequence encountered in the earliest bootstrap was chosen. The confidence level was calculated based on this sequence compared to the selected sequences from the other 99 bootstraps. This could lead to a suboptimal choice with a low confidence. In the new version, the most common of the sequences with the highest amount of word similarity across the 100 bootstraps will be selected, and ties will be broken randomly.
Another problem with the old implementation was that if several sequences had the same amount of word similarity, the shortest one in the reference database would be chosen, and if they were equally long, the earliest in the database file would be chosen. A new option called sintax_random
has now been introduced. This option will randomly select one of the sequences with the highest number of shared words with the query, without considering their length or position. This avoids a bias towards shorter reference sequences. This option is strongly recommended and will probably soon be the default.
Furthermore, a ninth taxonomic rank, strain (letter t), is now recognized. The speed of the sintax command has also been significantly improved at least in some cases. Run vsearch with the randseed
option and 1 thread to ensure reproducibility of the random choices in the algorithm.
VSEARCH 2.27.1
This version fixes the weak_id
option and makes searches report weak hits in some cases. It also updates the names of the compression libraries to libz.so.1
and libbz2.so.1
on Linux to make them work on common Linux distributions without installing additional packages. README.md
has been updated with information about compression libraries on Windows.
VSEARCH 2.27.0
The usearch_global
and search_exact
commands now support FASTQ files as well as FASTA files as input. This version of vsearch includes clarifications and updates to the manual. Some code has been refactored. Generic Dockerfiles for major Linux distributions have been included. Some warnings from compilers and other tools have been eliminated. The release for Windows will also include DLL's for the two compression libraries.
VSEARCH 2.26.1
Enable the maxseqlength
and minseqlength
options for the chimera detection commands. When the usearch_global
or search_exact
commands are used, OTU tables will include samples and OTUs with no matches.
The previous release 2.26.0 was removed because the version number had not been updated from 2.25.0 in the enclosed source code. To avoid confusion the previous release has been removed and replaced by this version 2.26.1.
VSEARCH 2.25.0
Allow a given percentage of mismatches, specified with the chimeras_diff_pct
option, between chimeras and parents for the experimental chimeras_denovo
command.
VSEARCH 2.24.0
Update documentation. Improve code. Allow up to 20 parents for the undocumented and experimental chimeras_denovo
command. Fix compilation warnings for sha1.c
. Compile for release (not debug) by default.
VSEARCH 2.23.0
Update documentation. Add citation file. Modernize and improve code. Fix several minor bugs. Fix compilation with GCC 13. Print stats after fastq_mergepairs to log file instead of stderr. Handle sizein option correctly with dbmatched option for usearch_global. Allow maxseqlength option for makeudb_usearch. Fix memory allocation problem with chimera detection. Add lengthout and xlength options. Increase precision for eeout option. Add warning about sintax algorithm, random seed and multiple threads. Refactor chimera detection code. Add undocumented experimental long_chimeras_denovo command. Fix segfault with clustering. Add more references.
VSEARCH 2.22.1
Add the derep_smallmem
command for dereplication with less memory usage. Remove compiler warnings.