Skip to content

Releases: vocalpy/vak

0.7.0

23 Nov 17:57
Compare
Choose a tag to compare

vak 0.7.0 release notes

vak 0.7.0 is a maintenance release, but it does include some new features and bug fixes.
Highlights:

  • For annotation formats that have one annotation file per annotated file, vak can now recognize when
    the annotation files are named by removing the annotated file extension (e.g., .wav or .npz)
    and replacing it with the annotation format extension, e.g. .txt or .csv. (Other ways of relating annotations
    and annotated files are still valid, e.g. by including the original source audio file in both filenames.)
  • The transform that normalizes spectrograms is now fit only to the training set; previously no split was specified and in some cases the entire dataset was used, which could potentially reduce the error on the test set because of dataset leakage (the model "knows" about the distribution of the test set because the parameters used to normalize the spectrograms take it into account). For training sets large enough to achieve good performance with current models, there is probably not a big enough difference between their distribution and that of the test set for this to seriously impact evaluation, but we have not tested this extensively.
  • Several other clean ups, additional unit tests, and minor bug fixes that should not have impacted performance but do make the library more efficient and robust.

Added

  • Add unit tests for csv.has_unlabled
    #541.
    Fixes #102.
  • Add unit tests for __main__
    #542.
    Fixes #337.
  • Add validation of labels argument to vak.split.algorithms.brute_force,
    to prevent conditions where algorithm can fail to converge
    because of bad input
    #562.
    Fixes #288.
  • Add a "Frequently Asked Questions" page to the documentation,
    and a page to the "Reference" section on file naming conventions
    #564.
    Fixes #524
    and #424.
  • Add a new way for vak to map annotation files to annotated files
    when preparing datasets, e.g. for training models.
    For annotation formats that have one annotation file per
    annotated file, vak can now recognize when
    the annotation files are named by removing the
    annotated file extension (e.g., .wav or .npz)
    and replacing it with the annotation format extension,
    e.g. .txt or .csv. (Other ways of relating annotations
    and annotated files are still valid, e.g. by including
    the original source audio file in both filenames.)
    #572.
    Fixes #563.
  • Have runs from command-line interface log version to logfile
    #587.
    Fixes #216.

Changed

  • Rewrite unit tests in tests/test_cli/ to use mocks for vak.core functions
    #544.
    Fixes #543.
  • It is now possible to load configuration files
    and work with them programmatically even if the paths
    they point to do not exist.
    The core functions handle validation instead.
    E.g., the PrepConfig class does not check whether
    output_dir exist is a directory, but vak.core.prep does.
    #550.
    Fixes #459.
  • Refactor and speed up logic for determining whether a
    dataset with sequence annotations has unlabeled segments
    that should be assigned a "background" label
    #559.
    Fixes #243.
    • Adds a new sub-sub-package, datasets.seq
      with a validators module, which is where the
      re-written has_unlabeled function now lives.
      Replaces the vak.csv module which was not well named.
    • Also adds a has_unlabeled function to vak.annotation
      that is used by vak.datasets.seq.validators.has_unlabeled;
      this function handles edge cases outlined in
      #243.
  • Rename and refactor functions in vak.annotation
    that map annotations to the files that they annotate,
    so that the purpose of the functions is clearer,
    and add clearer error messages with links to documentation
    about file naming conventions
    #566.
    Fixes #525.
  • Revise "autoannotate" tutorial to use .wav audio and .csv
    annotation files from new release of Bengalese Finch Song
    Repository, and to suggest that Windows users unpack
    archives with tar, not other programs such as WinZip
    #578.
    Fixes #560
    and #576.
  • Change vak.files.find_fname and vak.files.spect.find_audio_fname
    so they work when spaces are in filename and/or path
    #594.
    Fixes #589.

Fixed

  • Fix how vak.core.prep handles labelset parameter.
    Add pre-condition that raises a ValueError
    when labelset is None but the .toml config is one of
    {'train', 'learncurve', 'eval'}
    #545.
    Avoids running computationally expensive step of generating
    and validating spectrograms before crashing when trying to
    split the dataset using labelset. Also avoids silent
    failures for datasets that do not require splitting,
    e.g., an 'eval' set that could contain labels not in the
    training set.
    Fixes #468.
  • Fix how cli and core functions that have the csv_path parameter
    handles it. The parameter points to a dataset .csv generated by vak prep
    that other core/cli function use: train, learncurve, eval, predict.
    They now validate that it exists, and if it doesn't, the cli functions
    politely suggest running vak prep first; the core functions
    raise a FileNotFoundError.
    #546.
    Fixes #469.
  • Fix bug where labelmap_path parameter was ignored by core.train.
    Change function so that either labelmap_path or labelset must
    be passed in, both passing in both will raise an error.
    Also change cli.train to only pass in one of those and set the other
    to None.
    #552.
    Fixes #547.
  • Fix vak.annotation.has_unlabeled to handle the edge case where an
    annotation file has no annotated segments
    #583.
    Fixes #378.
  • Fix StandardizeSpect method fit_df so that it computes
    parameters for standardization from a specific
    split of the dataset--the training split, by default--instead
    of using the entire dataset, which could technically give rise
    to data leakage
    #584.
    Fixes #575.
  • Fix error message in vak.core.eval
    #589.
    Fixes #588.

0.6.0

08 Jul 00:19
Compare
Choose a tag to compare

0.6.0 -- 2022-07-07

Added

  • better document conda install
    #528.
    Fixes #527.
  • Add tests for console script, i.e., the command-line interface
    #533.
    Fixes #369.

Changed

  • switch from using make to nox for running tasks
    #532.
    Fixes #440.
  • Refactor logging so that it can be configured by cli functions
    when running vak through command-line interface, and by users
    that are working with the API directly
    #535.

Fixed

  • Fix bug that prevented creating spectrogram files with non-default keys
    (e.g. 'spect' instead of the default 's'). Needed to pass keys from spect_params
    into spect.to_dataframe inside vak.io.dataframe.from_files.
    #531.
    Fixes #412.
  • Fix logging so a single message is not logged multiple times.
    #535.
    Fixes #258.

0.5.0.post1

30 Jun 02:37
Compare
Choose a tag to compare
DEV: Bump version to 0.5.0.post1

0.5.0

30 Jun 02:36
Compare
Choose a tag to compare
DEV: Bump version to 0.5.0

0.4.2

29 Mar 13:30
Compare
Choose a tag to compare
DEV: bump version to 0.4.2 [skip ci]

0.4.1

07 Jan 14:26
Compare
Choose a tag to compare
DEV: bump version to 0.4.1 [skip ci]

0.4.0

23 Nov 17:35
Compare
Choose a tag to compare

0.4.0 -- 2021-12-29

Added

  • add a CITATION.cff file
    #407.
  • add an all-contributors table to README,
    using their bot to adopt the spec.
    E.g., #395.
    Fixes #387.
  • add description of command-line interface to reference section of documentation.
    #417.
    Fixes #270.
  • add how-to on using an annotation format that's not built in
    #421.
    Fixes #397.
  • add how-to on using custom spectrograms
    #421.
    Fixes #413.

Changed

  • updated the .toml configuration files in the tutorial
    to match what was used for TweetyNet paper.
    #416.
    Fixes #414.
  • move tutorial into "getting started" section of docs,
    and revise landing page of docs
    #419.
  • revise the documentation for the configuration file format.
    Show valid options for each section by including docstrings from the classes
    that represents the different sections
    #428.
    Fixes #271.

Fixed

  • make further fixes + add unit tests for handling predictions where all timebins
    are the background "unlabeled" class #409.
    Fixes bug in remove_short_segments #403.
    Related to #393
    and #386.
  • fix docs so entries appear in navbar
    #427.
    Fixes #426.

0.4.0b6

28 Nov 02:48
Compare
Choose a tag to compare
0.4.0b6 Pre-release
Pre-release
DEV: bump version to 0.4.0b6

0.4.0b5

09 Oct 01:00
Compare
Choose a tag to compare
0.4.0b5 Pre-release
Pre-release
DEV: bump version to 0.4.0b5

0.4.0b4

25 Apr 22:35
Compare
Choose a tag to compare
0.4.0b4 Pre-release
Pre-release
DEV: bump version to 0.4.0b4