UD alignment: interaction of :correct and :subt for multiword token #79

nschneid · 2023-04-01T03:02:06Z

We have an instance of cant which is a misspelling of can't. In EWT, there are two words, the second of which is nt with CorrectForm=n't.

In CGEL, should this be

(V_aux :t "cant" :correct "can't" :subt "ca" :subt "n't")

or

(V_aux :t "cant" :correct "can't" :subt "ca" :subt "nt")

I.e. are subtoken corrections incorporated into the :subt? Or is :subt strictly for unnormalized surface strings? Note that the MWT string in UD is cant.

The text was updated successfully, but these errors were encountered:

nschneid · 2023-04-01T03:13:03Z

align_tokens.py currently interprets :subt as part of the surface string, so it doesn't reflect any corrections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UD alignment: interaction of :correct and :subt for multiword token #79

UD alignment: interaction of :correct and :subt for multiword token #79

nschneid commented Apr 1, 2023

nschneid commented Apr 1, 2023

UD alignment: interaction of :correct and :subt for multiword token #79

UD alignment: interaction of :correct and :subt for multiword token #79

Comments

nschneid commented Apr 1, 2023

nschneid commented Apr 1, 2023