-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MWE numbering within sentence is inconsistent #42
Comments
For a normal form, it probably makes the most sense to number MWEs in ascending order by start token, using strength only as a tiebreaker (strong before weak—note that weak will be a superset of strong tokens). That way if the strength of an MWE in isolation is modified it won't require renumbering. And if the strength distinction is removed, it will mean collapsing some strong+weak combinations, but not reordering MWEs. |
Numbering is renormalized in streusle.conllulex (not yet propagated to splits) |
Fully fixed in #47 |
In some sentences all strong MWEs are numbered before weak ones; in others the numbering is by token offset.
This does not matter for the semantics, but it means that equivalent files will be superficially different. So perhaps we should enforce a normal form for numbering MWEs.
In the script for #41:
streusle/UDlextag2json.py
Lines 53 to 62 in 09014b4
streusle/UDlextag2json.py
Lines 124 to 129 in 09014b4
The text was updated successfully, but these errors were encountered: