Skip to content

v0.15.0

Latest
Compare
Choose a tag to compare
@percevalw percevalw released this 13 Dec 19:11

Changelog

Added

  • edsnlp.data.read_parquet now accept a work_unit="fragment" option to split tasks between workers by parquet fragment instead of row. When this is enabled, workers do not read every fragment while skipping 1 in n rows, but read all rows of 1/n fragments, which should be faster.
  • Accept no validation data in edsnlp.train script
  • Log the training config at the beginning of the trainings
  • Support a specific model output dir path for trainings (output_model_dir), and whether to save the model or not (save_model)
  • Specify whether to log the validation results or not (logger=False)
  • Added support for the CoNLL format with edsnlp.data.read_conll and with a specific eds.conll_dict2doc converter
  • Added a Trainable Biaffine Dependency Parser (eds.biaffine_dep_parser) component and metrics
  • New eds.extractive_qa component to perform extractive question answering using questions as prompts to tag entities instead of a list of predefined labels as in eds.ner_crf.

Fixed

  • Fix join_thread missing attribute in SimpleQueue when cleaning a multiprocessing executor
  • Support huggingface transformers that do not set cls_token_id and sep_token_id (we now also look for these tokens in the special_tokens_map and vocab mappings)
  • Fix changing scorers dict size issue when evaluating during training
  • Seed random states (instead of using random.RandomState()) when shuffling in data readers : this is important for
    1. reproducibility
    2. in multiprocessing mode, ensure that the same data is shuffled in the same way in all workers
  • Bubble BaseComponent instantiation errors correctly
  • Improved support for multi-gpu gradient accumulation (only sync the gradients at the end of the accumulation), now controled by the optiona sub_batch_size argument of TrainingData.
  • Support again edsnlp without pytorch installed
  • We now test that edsnlp works without pytorch installed
  • Fix units and scales, ie 1l = 1dm3, 1ml = 1cm3

Pull Requests

Full Changelog: v0.14.0...v0.15.0