Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of conjunction/parallel constructions #12

Open
nikitakit opened this issue Feb 4, 2024 · 2 comments
Open

Handling of conjunction/parallel constructions #12

nikitakit opened this issue Feb 4, 2024 · 2 comments

Comments

@nikitakit
Copy link

HDT contains a large number of trees that my scripts have identified as having a greater degree of non-projectivity than is typical in UD. After reviewing the data, I have some questions about how this treebank annotates coordination.

Here is a typical example, sentence hdt-s186616

Die Taktrate soll von 33 auf 133 MHz steigen , die Busbreite von 32 auf 64 Bit .

hdt-s186616

The clock rate will increase from 33 to 133 MHz and the bus width from 32 to 64 bits.
[English translation via Google Translate]

The annotated arcs are tracking the parallel nature of the construction, Taktrate:Busbreite :: 33:32 :: MHz:Bit.

This naturally creates highly non-projective trees, and the more parallel elements there are the more non-projective they get. Elsewhere in UD I believe this is avoided using orphan relations and/or null elements. Is it then the case that these trees in HDT diverge from the UD guidelines, or am I misunderstanding something?

Disclamer: I don't speak German


The following dev/test sentences in have been flagged by my scripts:

  • hdt-s108400 (in the dev set)
  • hdt-s117179 (in the test set)

The following training sentences in have been flagged by my scripts:

  • hdt-s48421
  • hdt-s49251
  • hdt-s58870
  • hdt-s78636
  • hdt-s125751
  • hdt-s131804
  • hdt-s146911
  • hdt-s150634
  • hdt-s156335
  • hdt-s165292
  • hdt-s167495
  • hdt-s176414
  • hdt-s178065
  • hdt-s178067
  • hdt-s178068
  • hdt-s180686
  • hdt-s185183
  • hdt-s186616
  • hdt-s189013
  • hdt-s197213
  • hdt-s200957

My scripts only detect extreme non-projectivity, so this is likely not an exhaustive list.


Some more examples:

hdt-s185183

Intel bietet dazu beispielsweise den i815E B-Step an , ALi den Aladdin Pro5T , SiS den SiS635T und VIA den Apollo Pro133T und Pro 266T .

hdt-s185183

hdt-s189013

Der Palm m500 soll ein Graustufen-Display haben , der Palm m505 dagegen ein reflektives Farbdisplay , über das bislang nur der Compaq iPaq H3630 verfügt .

hdt-s189013
@dan-zeman
Copy link
Member

Elsewhere in UD I believe this is avoided using orphan relations and/or null elements. Is it then the case that these trees in HDT diverge from the UD guidelines, or am I misunderstanding something?

You got it right. This is annotation error (I looked at the first example only) and it should be analyzed as gapping, with the help of the orphan relation.

@nikitakit
Copy link
Author

Thank you @dan-zeman for taking a look at this and the other issues I've been filing! It's really helpful to know that I'm not far off-track with languages I don't understand.

For the time being I won't rely on HDT for purposes that need the UD gapping analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants