You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading training data:
Cannot load training data from file 'UD_Czech-CAC-master/cs_cac-ud-train.conllu':
The CoNLL-U line
'39 vytvrditelné vytvrditelný ADJ AAFP1----1A---- Case=Nom|Degree=Pos|Gender=Fem|Number=Plur|Polarity=Pos 27 acl:relcl 27:acl:relcl SpaceAfter=No|LDeriv=vytvrdit { přidat k tvrdit }'
contains spaces in column MISC!
Does this mean the treebank is broken? Or is there an option in UDPipe that I could use to get over this?
Thank you,
Michal
The text was updated successfully, but these errors were encountered:
This line is surprising and I think the part { přidat k tvrdit } should not be there; nothing similar occurs anywhere else in the treebank.
However, spaces in MISC are not an error in general, so UDPipe should not die on them @foxik. (I think a leading or trailing whitespace would trigger a validation error, but there can be a space in the middle of a value, for example, if there is Latin transliteration of a FORM or LEMMA that contain a space.)
If I recall correctly, the spaces in MISC were not originally allowed in CoNLL-U v2 (maybe in the proposed version) -- so the implementation in UDPipe 1 did not originally allowed them, only in FORM and LEMMA. The spaces in MISC are allowed since ufal/udpipe@9df115a, but we have not made a release since then (yes, it is long planned...). Once the release is made, it will work again; or it is possible to compile manually in the meantime.
Note that this affects also UDPipe 2 (which uses UDPipe 1 for tokenization).
When I attempt to train a UDPipe model from this treebank, using UDPipe 1.2.0:
I get the following error message:
Does this mean the treebank is broken? Or is there an option in UDPipe that I could use to get over this?
Thank you,
Michal
The text was updated successfully, but these errors were encountered: