Skip to content

Commit

Permalink
fix: remove pollution exclusion during numbers matching
Browse files Browse the repository at this point in the history
  • Loading branch information
percevalw committed Aug 24, 2024
1 parent fa135e6 commit 436fe39
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 1 deletion.
6 changes: 6 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog

## Unreleased

### Fixed

- Numbers are now only detected without trying to remove the pollution in between digits, ie `55 @ 77777` could be detected as a full number before, but not anymore.

## v0.13.0

### Added
Expand Down
7 changes: 6 additions & 1 deletion edsnlp/pipes/misc/measurements/measurements.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,7 +714,12 @@ def __init__(
self.unitless_patterns[pattern_name] = {"name": name, **pattern}

# NUMBER PATTERNS
self.regex_matcher.add("number", [number_regex])
self.regex_matcher.add(
"number",
[number_regex],
ignore_excluded=False,
ignore_space_tokens=False,
)
self.number_label_hashes = {nlp.vocab.strings["number"]}
for number, terms in number_terms.items():
self.term_matcher.build_patterns(nlp, {number: terms})
Expand Down
1 change: 1 addition & 0 deletions tests/pipelines/misc/test_measurements.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,7 @@ def test_numbers(blank_nlp: PipelineProtocol, matcher: MeasurementsMatcher):
("2 m", "2 m"),
("⅛ m", "0.125 m"),
("0 m", "0 m"),
("55 @ 77777 cm", "77777 cm"),
]:
doc = blank_nlp(text)
doc = matcher(doc)
Expand Down

0 comments on commit 436fe39

Please sign in to comment.