-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors with geological time data #50
Comments
The first one is a bug in our handling of the scala> parser.parseToXML("We searched for correlations between timing in diversification and timing of (1) a period of marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago (Ma)")
res5: scala.xml.Elem =
<data>
<annotations>
<entity>
<id>0@id</id>
<span>83,89</span>
<type>Period</type>
<properties>
<Type>periods</Type>
</properties>
</entity>... The second is because we very little training data on fractional numbers, so the model does not handle them well. In this case, it actually produces a parse, but the parse is wrong because it doesn't connect the 37.5 to anything.: scala> parser.parse("marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago", anchor)
res8: Array[org.clulab.timenorm.scate.TimeExpression] = Array(FractionalNumber(37,1,2,Some((74,78))), BeforeP(SimpleInterval(2019-11-20T00:00,2019-11-21T00:00),SimplePeriod(Years,IntNumber(1000000,Some((79,86))),None,Some((87,92))),Some((93,96)))) The third is another wrong parse, where it has identified "million" as both a Frequency and a Number and doesn't know how to link it up propery: scala> print(parser.parseToXML("major volcanic activity at DSDP Site 216 on Ninetyeast Ridge (Kerguelen hotspot) where volcanic sediments above basement basalt are dated ~69.5 Ma (nannofossil zone UC20a, planktic foraminiferal zones CF5CF4) and continued for about 2 million years."))
<data>
<annotations>
<entity>
<id>0@id</id>
<span>233,234</span>
<type>Number</type>
<properties>
<Value>2</Value>
</properties>
</entity><entity>
<id>1@id</id>
<span>235,242</span>
<type>Number</type>
<properties>
<Value>1000000</Value>
</properties>
</entity><entity>
<id>2@id</id>
<span>235,242</span>
<type>Frequency</type>
<properties>
<Number>1@id</Number><Type>million</Type>
</properties>
</entity><entity>
<id>3@id</id>
<span>243,248</span>
<type>Period</type>
<properties>
<Number>0@id</Number><Type>Years</Type>
</properties>
</entity>
</annotations>
</data> Your last example is again the problem that the parser doesn't handle fractional numbers well. @EgoLaparra: Any thoughts on how to fix any of these? |
@MihaiSurdeanu
Hi,
I am currently working on geological data, and the data contains the temporal expressions like "cold period", "2.65 million years ago", or "since Paleogene" (Paleogene: 66 million years ago ~ 23.03 million years ago). When I try to normalize those temporal expressions in the data, it showed me the following errors:
or
And for the different sentence, it showed following error messages:
When I fed the sentence I created, it showed me the result with the error message. Even though it showed me the result, I think the result is different from what it should be.
Do you have any thoughts on this type of issue?
Thanks,
The text was updated successfully, but these errors were encountered: