Rule only works when commenting out unrelated rules? #80

unhammer · 2021-11-10T14:52:50Z

gender = m f nt ut un fn mf GD ;
gender_adj_sg_ind = nt ut ;
number = sg pl sp ND ;
defnes = def ind ;
a_adj = sint ord pp pprs ;
a_cmp = cmp ;
a_det = dem qnt pos emph ;
a_comp = pst comp sup ;

adj:   _.a_adj.a_comp.gender.number.defnes.a_cmp;
n:     _.gender.number.defnes.a_cmp;
det:   _.a_det.gender.number;

N:     _.gender.number.defnes.a_cmp;
A:     _.a_adj.a_comp.gender.number.defnes.a_cmp;
NP:    _.gender.number.defnes;
DP:    _.gender.number.defnes;


N -> %n         { %1 } ;

NP ->      %N { %1 }
    |  adj %N { 1 _ %2 } !!!
    ;

DP ->
      "vennene mine ~> mina vänner"
      %NP det.pos
      { 2[gender=(if (1.number = pl) un else 1.gender), number=1.number]
        _
        1[defnes=ind]
      }
    | "en venn ~> en vänn" det %NP { 1[gender=(if (2.number = pl) un else 2.gender), number=2.number] _ 2 } !!!

      ;

got:

$ echo ' ^venn<n><m><pl><def>/vän<n><ut><pl><def>$ ^min<det><pos><un><pl>/min<det><pos><un><pl>$ ^virtuell<adj><pst><nt><sg><ind>/virtuell<adj><sint><pst><nt><sg><ind>$' |rtx-proc nor-swe.rtx.bin
 ^vän<n><ut><pl><def>$ ^min<det><pos><un><pl>$ ^virtuell<adj><sint><pst><nt><sg><ind>$

expected:

 ^min<det><pos><un><pl>$ ^vän<n><ut><pl><ind>$ ^virtuell<adj><sint><pst><nt><sg><ind>$

HOWEVER: If I comment out either line 23 or line 33 (the ones marked !!!) then it strangely works.

But trace shows that those lines are not used (this is without commenting them out, where I get the bad result):

 echo ' ^venn<n><m><pl><def>/vän<n><ut><pl><def>$ ^min<det><pos><un><pl>/min<det><pos><un><pl>$ ^virtuell<adj><pst><nt><sg><ind>/virtuell<adj><sint><pst><nt><sg><ind>$' |rtx-proc -r nor-swe.rtx.bin

Applying rule 1 (line 20): ^venn<n><m><pl><def>/vän<n><ut><pl><def>$

Applying rule 2 (line 22): ^vän<N><ut><pl><def>{^venn<n><m><pl><def>/vän<n><ut><pl><def>$}$

Applying rule 4 (vennene mine ~> mina vänner - line 27): ^vän<NP><ut><pl><def>{^vän<N><ut><pl><def>{^venn<n><m><pl><def>/vän<n><ut><pl><def>$}$}$ ^min<det><pos><un><pl>/min<det><pos><un><pl>$

Applying output rule 1 (line 22): vän<NP><ut><pl><def> -> ^vän<N><ut><pl><def>{^venn<n><m><pl><def>/vän<n><ut><pl><def>$}$

Applying output rule 0 (line 20): vän<N><ut><pl><def> -> ^venn<n><m><pl><def>/vän<n><ut><pl><def>$

No rule specified: ^vän<n><ut><pl><def>$
^vän<n><ut><pl><def>$
No rule specified: ^min<det><pos><un><pl>/min<det><pos><un><pl>$
^min<det><pos><un><pl>$
No rule specified: ^virtuell<adj><pst><nt><sg><ind>/virtuell<adj><sint><pst><nt><sg><ind>$
^virtuell<adj><sint><pst><nt><sg><ind>$

I'm probably missing something obvious but I can't see it?

The text was updated successfully, but these errors were encountered:

unhammer · 2021-11-10T14:56:06Z

The trace for when line 33 is commented out shows not just applying rule 3 (line 27), but applying output rule 3 (line 27)

unhammer · 2021-11-16T11:58:33Z

Note also if I just don't include the last word, the rule hits fine.

mr-martian · 2021-11-16T14:46:08Z

So the lookahead is trying to figure out whether to keep branches alive in case more rules might apply. You have n det adj, which it thinks could be n DP{ det NP{ adj [n] } }, not realizing that this is actually det.pos, which it looks like you want treated differently.

So the solution is probably for the lookahead to get smarter and for the last rule to change from det to det.[notpos], for a suitable definition of notpos.

The tricky part of this is whether I can fully do that without implementing FST subtraction in lttoolbox (or maybe I should just go ahead and do that...).

unhammer · 2021-11-16T14:55:25Z

So if I understand correctly it's starting an analysis of n DP{ det NP{ adj [n] } } because there might be an n to the right. But the trace shows it did at one point find the right match, wouldn't it be more robust to backtrack to that?

Also, I can't change the last rule to det.[nonpos] because I do want it to match det.pos (in nob, mine venner and vennene mine are both possible, while in swe we want only the former).

My current workaround is to have a higher-level rewrite rule DP2 → DP Anyword, but it doesn't really make linguistic sense.

mr-martian · 2021-11-16T15:19:27Z

IRC:

[10:13:28] <popcorndude> the answer is that this actually is an annoyingly deep issue
[10:14:13] <popcorndude> at least in the reduced case, it reads in the adj
[10:14:50] <popcorndude> and then says DP{NP{N{n}} det} can't do anything with this, but NP{N{n}} det maybe can
[10:14:54] <popcorndude> so discard the first one
[10:14:58] <popcorndude> oh, oops, EOF
[10:17:44] <popcorndude> so I can write hacky rules to fix this in particular cases, but I have no idea how to solve this in general

unhammer · 2021-12-16T11:50:11Z

Is there a way to give some info in the trace when this applies? It's quite hard to debug when it happens. E.g. I have rules that do

DP{NP{N{n.cmp n}} det}  →*   DP{det NP{N{n.cmp n}}}   ! vennene mine → mina vännar

and they work fine and then I add vcmp into the N rule so I can do

DP{NP{N{vblex.inf.cmp n}} det}  →*   DP{det NP{N{vblex.inf.cmp n}}} ! bakemesteren vår → vår bakmästare

and it works fine and but then I notice the first rule stops working in certain contexts :(

Turns out, if there's any verb in the rest of the sentence (doesn't have to be tagged cmp), the rule doesn't apply any more. Again, the fix is just to ensure the wider context has a parse (a rule like S→DP VP), but I only learnt that by accident, and I had almost forgotten the fix when the problem showed up again.

mr-martian · 2021-12-16T11:54:32Z

Information about what parses are getting discarded and why can be gotten from the -e debug option, though it prints out rather a lot of stuff and I don't guarantee it makes all that much sense.

unhammer · 2023-08-24T11:15:32Z

We're seeing this issue again in sme-smj, e.g. we have rules for
N→n
NP→NP N | N
PP→N p | p
and on seeing a sequence n n p, it gives a parse for the final two words, but doesn't then apply anything for the first word (I think. I'm not 100% sure about the details here). But the first noun does get a parse if I send it in alone.

Would it be possible to do a final pass after everything is done and just treat all the unmatched lexical units in isolation, so they're at least matched by some single-word rule?

unhammer · 2023-08-25T12:55:59Z

With sme-smj.rtx.zip:

$ echo '^Jämtlánda<np><top><sg><gen><@→N>/Jämtlánnda<np><top><sg><gen><@→N>$ ^regiovdna<n><sem_plc><sg><gen><@→P>/regiåvnnå<n><sem_plc><sg><gen><@→P>$ ^dáfus<post><@ADVL>/gáktuj<post><@ADVL>$^.<sent>/.<sent>$' | rtx-proc -e sme-smj.rtx.bin
[…]
Branch 3: 3 nodes, weight = 0
[Chunk]:
^Jämtlánnda<Name><sg><gen><@→N>{
        ^Jämtlánda<np><top><sg><gen><@→N>/Jämtlánnda<np><top><sg><gen><@→N>$
}$
[Blank]:

[Chunk]:
^gáktuj<PP>{
        ^regiåvnnå<N><sg><gen><@→P>{
                ^regiovdna<n><sem_plc><sg><gen><@→P>/regiåvnnå<n><sem_plc><sg><gen><@→P>$
        }$
        ^dáfus<post><@ADVL>/gáktuj<post><@ADVL>$
}$
Branch 4: 3 nodes, weight = 0
[Chunk]:
^Jämtlánda<np><top><sg><gen><@→N>/Jämtlánnda<np><top><sg><gen><@→N>$
[Blank]:

[Chunk]:
^gáktuj<PP>{
        ^regiåvnnå<N><sg><gen><@→P>{
                ^regiovdna<n><sem_plc><sg><gen><@→P>/regiåvnnå<n><sem_plc><sg><gen><@→P>$
        }$
        ^dáfus<post><@ADVL>/gáktuj<post><@ADVL>$
}$

Filtering Branches:
No branch can accept further input.
Branch 3  has no active branch to compare to.
Branch 4  has fewer partial parses or a higher weight than branch 3.
[…]

– isn't this plain wrong? Or am I misunderstanding what "partial parses" means? (In 3, all words have at least one parent, while in branch 4 (which is chosen), the first word has no parent node.)

EDIT: It seems the test is (cur->length < minNode->length || (cur->length == minNode->length && cur->weight >= minNode->weight))
and the values are

cur->length:3
minNode->length:3
cur->weight:0
minNode->weight:0

so they're just equal.

mr-martian · 2023-08-25T13:26:11Z

Yeah, I think it's >= since the branches later in the list have usually had more rules applied to them.

workaround for apertium/apertium-recursive#80

unhammer · 2023-08-25T20:59:25Z

So I noticed that simply changing the file to have weights on each rule made it choose the parse that has more parses, and when doing that across a real rule file for sme-smj, it removes some untranslated words from corpus runs.

Is there a good reason not to have some "initial" weight for every rule, so it can favour parses that cover more words? (Will it then favour deeper trees as well?)

mr-martian · 2023-08-25T21:40:48Z

Yes, it will slightly favor deeper trees, but given how reduce-reduce conflicts are handled, those are favored already.

Perhaps we could add another file-level directive to change the default weight to something positive, since that will indeed improve the situation in many cases.

mitigates #80

mitigates #80 We splice in the outputQueueReparsed instead of just replacing in case the output rule changes the number of LU's output.

* had to std:: here to make it compile * Reparse individual non-parsed words after full sentence mitigates #80 We splice in the outputQueueReparsed instead of just replacing in case the output rule changes the number of LU's output. * Tests for reparse #80 * Note to self (don't edit run_tests.py)

unhammer added a commit to apertium/apertium-swe-nor that referenced this issue Nov 16, 2021

more workaround for apertium/apertium-recursive#80

42ed862

unhammer mentioned this issue Apr 11, 2022

Blanks changed because of unmatched rules #83

Open

unhammer added a commit to apertium/apertium-sme-smj that referenced this issue Aug 25, 2023

Ensure each rule has a weight, seems to help a bit with fragments

04f3b97

workaround for apertium/apertium-recursive#80

unhammer added a commit that referenced this issue Aug 28, 2023

reparse individual non-parsed words after full sentence

ee76cb5

mitigates #80

unhammer mentioned this issue Aug 28, 2023

reparse individual non-parsed words after full sentence #94

Merged

unhammer added a commit that referenced this issue Sep 5, 2023

Tests for reparse #80

07a9fe9

unhammer added a commit that referenced this issue Sep 5, 2023

Reparse individual non-parsed words after full sentence

a944d20

mitigates #80 We splice in the outputQueueReparsed instead of just replacing in case the output rule changes the number of LU's output.

unhammer added a commit that referenced this issue Sep 5, 2023

Tests for reparse #80

601ec11

unhammer added a commit that referenced this issue Sep 5, 2023

Reparse individual non-parsed words after full sentence

464d5f7

mitigates #80 We splice in the outputQueueReparsed instead of just replacing in case the output rule changes the number of LU's output.

unhammer added a commit that referenced this issue Sep 5, 2023

Tests for reparse #80

c94fea3

unhammer added a commit that referenced this issue Sep 5, 2023

Reparse individual non-parsed words after full sentence

7a112a7

mitigates #80 We splice in the outputQueueReparsed instead of just replacing in case the output rule changes the number of LU's output.

unhammer added a commit that referenced this issue Sep 5, 2023

Tests for reparse #80

d045e26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule only works when commenting out unrelated rules? #80

Rule only works when commenting out unrelated rules? #80

unhammer commented Nov 10, 2021 •

edited

Loading

unhammer commented Nov 10, 2021

unhammer commented Nov 16, 2021

mr-martian commented Nov 16, 2021

unhammer commented Nov 16, 2021

mr-martian commented Nov 16, 2021

unhammer commented Dec 16, 2021

mr-martian commented Dec 16, 2021

unhammer commented Aug 24, 2023 •

edited

Loading

unhammer commented Aug 25, 2023 •

edited

Loading

mr-martian commented Aug 25, 2023

unhammer commented Aug 25, 2023

mr-martian commented Aug 25, 2023

Rule only works when commenting out unrelated rules? #80

Rule only works when commenting out unrelated rules? #80

Comments

unhammer commented Nov 10, 2021 • edited Loading

unhammer commented Nov 10, 2021

unhammer commented Nov 16, 2021

mr-martian commented Nov 16, 2021

unhammer commented Nov 16, 2021

mr-martian commented Nov 16, 2021

unhammer commented Dec 16, 2021

mr-martian commented Dec 16, 2021

unhammer commented Aug 24, 2023 • edited Loading

unhammer commented Aug 25, 2023 • edited Loading

mr-martian commented Aug 25, 2023

unhammer commented Aug 25, 2023

mr-martian commented Aug 25, 2023

unhammer commented Nov 10, 2021 •

edited

Loading

unhammer commented Aug 24, 2023 •

edited

Loading

unhammer commented Aug 25, 2023 •

edited

Loading