yield increase extraction issue #157

IkeKobby · 2022-07-22T06:16:33Z

yield increase being extracted as a total yield currently in habitus; eg ..... yield increased for about 1.5 t ha-1 in the ... season after the application of XYZ fertilizer. How do we best handle this yield increase from being extracted as total yield from sentences? @maxaalexeeva

The text was updated successfully, but these errors were encountered:

maxaalexeeva · 2022-07-22T06:26:14Z

@IkeKobby try to make a dependency (syntax-based) rule with increase trigger with a new label eg YieldIncrease, variable:Yield and value:Quantity. If doesn't work, we can try it together in our Monday meeting.

IkeKobby · 2022-07-22T07:05:41Z

Alright sure.

IkeKobby · 2022-07-26T14:01:01Z

I think this issue can be closed now @maxaalexeeva right?

maxaalexeeva · 2022-07-27T03:59:56Z

@IkeKobby let's keep it open for a bit while I try to debug the issue with losing a yield amount and until we have this as a pr.

maxaalexeeva · 2022-07-27T04:04:10Z

@MihaiSurdeanu @kwalcock a quick question: other than habitus actions, is there anything in the extraction system that could filter out mentions (e.g., based on overlap of two mentions)? The issue in a nutshell: I modify a rule a bit (adding a negative lookbehind), and one mention in a sentence with two mentions stops showing up. When it's a chunk of text without the other mention, the extraction is extracted correctly.

IkeKobby · 2022-07-27T12:44:49Z

@IkeKobby let's keep it open for a bit while I try to debug the issue with losing a yield amount and until we have this as a pr.

Alright sure.

MihaiSurdeanu · 2022-07-27T13:59:08Z

Not that I recall. But I think we added such an action a while ago (that removes overlapping mentions) to habitus. Do you see it?

kwalcock · 2022-07-27T15:20:29Z

This code sounds suspicious:

  /** Keeps a belief mention only if it is not contained in another */
  def keepLongestMentions(mentions: Seq[Mention]): Seq[Mention] = {

removeRedundantVariableMentions could filter out mentions.

maxaalexeeva · 2022-07-28T06:58:13Z

@MihaiSurdeanu @kwalcock thanks! I tried to disable those but the test in question still fails (described below)---although the extraction seems fine in the shell.

Here's some more details on the issue:

In this rule, when I add the negative lookbehind at the beginning, this test fails (the yields > 8 t ha-1 part stops being extracted as a mention---note the plural in yields, the value should not be attached to the first yield---that yield variable and the 8 t ha-1 value are too far apart and will get filtered out).

The negative lookbehind there is to make sure this test passes - there should only be a yield increase mention there, not yield amount.

kwalcock · 2022-07-28T15:45:18Z

I can trace through the code and try to find out what is happening, but probably not until this afternoon. Thanks for the detailed links to specific lines of code in specific commits. It's greatly appreciated.

maxaalexeeva · 2022-07-28T16:06:17Z

@kwalcock thank you! It might be clear from the commits, but this is the name of the branch context-based-eval

kwalcock · 2022-07-29T07:58:05Z

I haven't been able to find the problem yet. The sentence is

"The highest yield ( 9.3 t ha-1 ) is obtained by Brodt et al. ( 2014 ) in California with only 170 kg N ha-1 ; followed by Xu et al. ( 2020 ) and Zhang et al. ( 2021 ) , both with yields > 8 t ha-1 in Hubei Province ( China ) .",

With the simplified rule

    @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

two mentions are found.
When the rule starts with "highest"

    [word = /(highest)/] @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

just the first is found.
When the rule starts with "with"

    [word = /(with)/] @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

the second is found. When it can start with either,

    [word = /(highest|with)/] @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

only the first is found again instead of both.

maxaalexeeva · 2022-07-29T08:10:11Z

@kwalcock hm... I experimented with adding things before yield (neg look behind, token with lemma not equal increase/decrease, and just any token) and all of those eliminated the second extraction. I thought it had to be something to do with some overlap I create by adding another token to the mention, but a) I can't find where it matters and b) negative look behind doesn't really add a token ..

kwalcock · 2022-07-30T04:10:19Z

I haven't been able to find the problem, either. I don't think it's anything in the actions. I've done things like replace most words with "fill" and change the sentence length. The lookbehind at the beginning seems to work if that ([]*? (","|"and")? @value:Quantity)* is removed, but having both is a problem. It seems like this would require some backtracking, maybe multiple levels of it, and maybe something goes wrong. I'd like to hear if Marco has an opinion before looking further. It seems like a good knowledge of the inner workings of Odin is necessary.

maxaalexeeva · 2022-07-30T21:33:22Z

@kwalcock, ah, those (replace most words with "fill" and change the sentence length) are good tests. Shall we tag Marco? Or shall I email?

kwalcock · 2022-07-31T00:51:23Z

@marcovzla, with all your knowledge of the inner workings of Odin, do you have a guess as to why this might not be working as intended? I edited the test file (TestVariableReaderLabelBased.scala) in the branch kwalcock/debugRule so that only the failing test runs, in case that helps. I'm about desperate enough to try to hot wire org.clulab.odin.impl.Result (or something similar) to include a record of how matches are found so that it can be inspected afterwards. I have a feeling you can point out something to which we'll have to say, "Well, duh, that should have been obvious."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yield increase extraction issue #157

yield increase extraction issue #157

IkeKobby commented Jul 22, 2022

maxaalexeeva commented Jul 22, 2022

IkeKobby commented Jul 22, 2022

IkeKobby commented Jul 26, 2022

maxaalexeeva commented Jul 27, 2022

maxaalexeeva commented Jul 27, 2022

IkeKobby commented Jul 27, 2022

MihaiSurdeanu commented Jul 27, 2022

kwalcock commented Jul 27, 2022

maxaalexeeva commented Jul 28, 2022

kwalcock commented Jul 28, 2022

maxaalexeeva commented Jul 28, 2022

kwalcock commented Jul 29, 2022

maxaalexeeva commented Jul 29, 2022

kwalcock commented Jul 30, 2022

maxaalexeeva commented Jul 30, 2022

kwalcock commented Jul 31, 2022 •

edited

Loading

yield increase extraction issue #157

yield increase extraction issue #157

Comments

IkeKobby commented Jul 22, 2022

maxaalexeeva commented Jul 22, 2022

IkeKobby commented Jul 22, 2022

IkeKobby commented Jul 26, 2022

maxaalexeeva commented Jul 27, 2022

maxaalexeeva commented Jul 27, 2022

IkeKobby commented Jul 27, 2022

MihaiSurdeanu commented Jul 27, 2022

kwalcock commented Jul 27, 2022

maxaalexeeva commented Jul 28, 2022

kwalcock commented Jul 28, 2022

maxaalexeeva commented Jul 28, 2022

kwalcock commented Jul 29, 2022

maxaalexeeva commented Jul 29, 2022

kwalcock commented Jul 30, 2022

maxaalexeeva commented Jul 30, 2022

kwalcock commented Jul 31, 2022 • edited Loading

kwalcock commented Jul 31, 2022 •

edited

Loading