Releases: vocalpy/vak
Releases · vocalpy/vak
0.7.0
vak 0.7.0 release notes
vak 0.7.0 is a maintenance release, but it does include some new features and bug fixes.
Highlights:
- For annotation formats that have one annotation file per annotated file, vak can now recognize when
the annotation files are named by removing the annotated file extension (e.g., .wav or .npz)
and replacing it with the annotation format extension, e.g. .txt or .csv. (Other ways of relating annotations
and annotated files are still valid, e.g. by including the original source audio file in both filenames.) - The transform that normalizes spectrograms is now fit only to the training set; previously no split was specified and in some cases the entire dataset was used, which could potentially reduce the error on the test set because of dataset leakage (the model "knows" about the distribution of the test set because the parameters used to normalize the spectrograms take it into account). For training sets large enough to achieve good performance with current models, there is probably not a big enough difference between their distribution and that of the test set for this to seriously impact evaluation, but we have not tested this extensively.
- Several other clean ups, additional unit tests, and minor bug fixes that should not have impacted performance but do make the library more efficient and robust.
Added
- Add unit tests for
csv.has_unlabled
#541.
Fixes #102. - Add unit tests for
__main__
#542.
Fixes #337. - Add validation of
labels
argument tovak.split.algorithms.brute_force
,
to prevent conditions where algorithm can fail to converge
because of bad input
#562.
Fixes #288. - Add a "Frequently Asked Questions" page to the documentation,
and a page to the "Reference" section on file naming conventions
#564.
Fixes #524
and #424. - Add a new way for vak to map annotation files to annotated files
when preparing datasets, e.g. for training models.
For annotation formats that have one annotation file per
annotated file, vak can now recognize when
the annotation files are named by removing the
annotated file extension (e.g., .wav or .npz)
and replacing it with the annotation format extension,
e.g. .txt or .csv. (Other ways of relating annotations
and annotated files are still valid, e.g. by including
the original source audio file in both filenames.)
#572.
Fixes #563. - Have runs from command-line interface log version to logfile
#587.
Fixes #216.
Changed
- Rewrite unit tests in
tests/test_cli/
to use mocks forvak.core
functions
#544.
Fixes #543. - It is now possible to load configuration files
and work with them programmatically even if the paths
they point to do not exist.
Thecore
functions handle validation instead.
E.g., thePrepConfig
class does not check whether
output_dir
exist is a directory, butvak.core.prep
does.
#550.
Fixes #459. - Refactor and speed up logic for determining whether a
dataset with sequence annotations has unlabeled segments
that should be assigned a "background" label
#559.
Fixes #243.- Adds a new sub-sub-package,
datasets.seq
with avalidators
module, which is where the
re-writtenhas_unlabeled
function now lives.
Replaces thevak.csv
module which was not well named. - Also adds a
has_unlabeled
function tovak.annotation
that is used byvak.datasets.seq.validators.has_unlabeled
;
this function handles edge cases outlined in
#243.
- Adds a new sub-sub-package,
- Rename and refactor functions in
vak.annotation
that map annotations to the files that they annotate,
so that the purpose of the functions is clearer,
and add clearer error messages with links to documentation
about file naming conventions
#566.
Fixes #525. - Revise "autoannotate" tutorial to use .wav audio and .csv
annotation files from new release of Bengalese Finch Song
Repository, and to suggest that Windows users unpack
archives with tar, not other programs such as WinZip
#578.
Fixes #560
and #576. - Change
vak.files.find_fname
andvak.files.spect.find_audio_fname
so they work when spaces are in filename and/or path
#594.
Fixes #589.
Fixed
- Fix how
vak.core.prep
handleslabelset
parameter.
Add pre-condition that raises a ValueError
whenlabelset
isNone
but the .toml config is one of
{'train', 'learncurve', 'eval'}
#545.
Avoids running computationally expensive step of generating
and validating spectrograms before crashing when trying to
split the dataset usinglabelset
. Also avoids silent
failures for datasets that do not require splitting,
e.g., an 'eval' set that could contain labels not in the
training set.
Fixes #468. - Fix how
cli
andcore
functions that have thecsv_path
parameter
handles it. The parameter points to a dataset .csv generated byvak prep
that othercore
/cli
function use:train
,learncurve
,eval
,predict
.
They now validate that it exists, and if it doesn't, thecli
functions
politely suggest runningvak prep
first; thecore
functions
raise a FileNotFoundError.
#546.
Fixes #469. - Fix bug where
labelmap_path
parameter was ignored bycore.train
.
Change function so that eitherlabelmap_path
orlabelset
must
be passed in, both passing in both will raise an error.
Also changecli.train
to only pass in one of those and set the other
toNone
.
#552.
Fixes #547. - Fix
vak.annotation.has_unlabeled
to handle the edge case where an
annotation file has no annotated segments
#583.
Fixes #378. - Fix
StandardizeSpect
methodfit_df
so that it computes
parameters for standardization from a specific
split of the dataset--the training split, by default--instead
of using the entire dataset, which could technically give rise
to data leakage
#584.
Fixes #575. - Fix error message in
vak.core.eval
#589.
Fixes #588.
0.6.0
0.6.0 -- 2022-07-07
Added
- better document
conda
install
#528.
Fixes #527. - Add tests for console script, i.e., the command-line interface
#533.
Fixes #369.
Changed
- switch from using
make
tonox
for running tasks
#532.
Fixes #440. - Refactor logging so that it can be configured by
cli
functions
when runningvak
through command-line interface, and by users
that are working with the API directly
#535.
Fixed
- Fix bug that prevented creating spectrogram files with non-default keys
(e.g. 'spect' instead of the default 's'). Needed to pass keys fromspect_params
intospect.to_dataframe
insidevak.io.dataframe.from_files
.
#531.
Fixes #412. - Fix logging so a single message is not logged multiple times.
#535.
Fixes #258.
0.5.0.post1
DEV: Bump version to 0.5.0.post1
0.5.0
DEV: Bump version to 0.5.0
0.4.2
DEV: bump version to 0.4.2 [skip ci]
0.4.1
DEV: bump version to 0.4.1 [skip ci]
0.4.0
0.4.0 -- 2021-12-29
Added
- add a CITATION.cff file
#407. - add an all-contributors table to README,
using their bot to adopt the spec.
E.g., #395.
Fixes #387. - add description of command-line interface to reference section of documentation.
#417.
Fixes #270. - add how-to on using an annotation format that's not built in
#421.
Fixes #397. - add how-to on using custom spectrograms
#421.
Fixes #413.
Changed
- updated the .toml configuration files in the tutorial
to match what was used for TweetyNet paper.
#416.
Fixes #414. - move tutorial into "getting started" section of docs,
and revise landing page of docs
#419. - revise the documentation for the configuration file format.
Show valid options for each section by including docstrings from the classes
that represents the different sections
#428.
Fixes #271.
Fixed
0.4.0b6
DEV: bump version to 0.4.0b6
0.4.0b5
DEV: bump version to 0.4.0b5
0.4.0b4
DEV: bump version to 0.4.0b4