-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation script that unpacks lextag into remaining STREUSLE columns #41
Comments
Input: .conllulex format except columns 11-18 are blank (not underscores; completely blank) I think the easiest way to implement this will be to adapt streuseval.py so that instead of VERIFYING that lextags are consistent with columns 11-18, it parses lextags and then populates columns 11-18 in JSON. Specifically, it needs to:
If we want the output as .conllulex, converting JSON to .conllulex could be a separate script. |
…rse_mwe_links(), which can be imported separately for lextag unpacking (#41)
…ession annotations from sequence of lextags (#41)
@danielhers I believe I have this working on the lextag-unpack branch. When reconstructing from the gold lextags I can't 100% match the original data file due to an arbitrary numbering issue (#42), but the streuseval score of the original vs. reconstructed is 100%, so there should not be any errors in the reconstruction. Hopefully this means the script is bug-free. |
Re: #40, we need a script that takes lextags (full tags, one per token) output by a system and parses them to extract MWE groupings.
Lextags are the 19th and final column in the .conllulex format. Columns 1-10 are UD. Columns 11-18 can be filled in based on UD+lextags.
The text was updated successfully, but these errors were encountered: