Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog
v0.10.6
Added
batch_by
,split_into_batches_after
,sort_chunks
,chunk_size
,disable_implicit_parallelism
parameters to processing (simple
andmultiprocessing
) backends to improve performanceand memory usage. Sorting chunks can improve yield up to twice the speed in some cases.
max_tokens_per_device="auto"
parameter toeds.transformer
to estimate memory usage and automatically split the input into chunks that fit into the GPU.Changed
eds.text_cnn
pipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases.bool_attributes
parameter in favor of general
default_attributes
. This new mapping describes how toset attributes on spans for which no attribute value was found in the input format.
This is especially useful for negation, or frequent attributes values (e.g. "negated"
is often False, "temporal" is often "present"), that annotators may not want to
annotate every time.
eds.ner_crf
window is now set to 40 and stride set to 20, as it doesn'taffect throughput (compared to before, window set to 20) and improves accuracy.
overlap_policy='merge'
option and parameter renaming ineds.span_context_getter
(which replaceseds.span_sentence_getter
)Fixed
multiprocessing
backend (e.g., no more deadlock)eds.ner_crf
component are nowdisabled when windows are used (e.g., neither
window=1
equivalent to softmax andwindow=0
equivalent to default full sequence Viterbi decoding)eds
tokenizer nows inherits fromspacy.Tokenizer
to avoid typing errorsu[ne] cure de 10 jours
Pipeline.preprocess
methodeds.hyphothesis
Checklist