Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blanks changed because of unmatched rules #83

Open
unhammer opened this issue Apr 11, 2022 · 1 comment
Open

Blanks changed because of unmatched rules #83

unhammer opened this issue Apr 11, 2022 · 1 comment

Comments

@unhammer
Copy link
Member

unhammer commented Apr 11, 2022

Possibly related to #80 , blanks are changed depending on unmatched rules.

b.rtx:

gender = m f nt ut un fn mf xpst xpsts xpsto xcomp xsup acr GD ;
number = sg pl sp ND ;
defnes = def ind ;
case = nom acc gen ;
a_det = dem rel qnt pos emph itg ;
a_clb = clb ;


sent:   _.a_clb;
det:    _.a_det.gender.number.case;

DP:    _.gender.number.defnes;
S:     _;


DP ->     "DP ~> det" %det { %1 }       ;

S ->
     "3."        det.qnt sent.clb.remspc   { %1 2 }
     ! | "2007:"        det.qnt sent   { %1 2 }
    ;

Note the space added which wasn't in input:

$ rtx-comp b.rtx b.rtx.bin

$ echo '^2007<det><qnt><un><pl><date>/2007<det><qnt><un><pl><date>$^:<sent><clb>/:<sent><clb>$' | rtx-proc b.rtx.bin
^2007<det><qnt><un><pl><date>$ ^:<sent><clb>$

Now uncomment a rule that matches the sequence and force-removes the space:

$ tr -d '!' < b.rtx >c.rtx

$ rtx-comp c.rtx c.rtx.bin

$ echo '^2007<det><qnt><un><pl><date>/2007<det><qnt><un><pl><date>$^:<sent><clb>/:<sent><clb>$' | rtx-proc c.rtx.bin
^2007<det><qnt><un><pl>$^:<sent><clb>$

but the problem is also "fixed" if you drop the whole S rule with the non-matching (or partially matched) det sent sequence

@mr-martian
Copy link
Collaborator

So the issue is that when we're inside a rule and the user writes { 1 _ 2 }, they almost certainly want an actual space there, so the code currently doesn't put empty blanks on the output queue to prevent that. Unfortunately, it currently can't tell the difference between blanks between partial trees and blanks within a full tree that is currently being disassembled.

The fix for this is slightly non-trivial, but I think what we want to do is record on each blank in the processor what index in the queue it corresponds to. Then in outputAll() we first record the index of any blank that isn't part of a tree and have writeBlank() treat the ranges between those points as mini-queues, skipping empty blanks and inserting spaces as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants