Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors with geological time data #50

Open
seongjinpark-88 opened this issue Nov 21, 2019 · 1 comment
Open

Errors with geological time data #50

seongjinpark-88 opened this issue Nov 21, 2019 · 1 comment

Comments

@seongjinpark-88
Copy link

@MihaiSurdeanu

Hi,

I am currently working on geological data, and the data contains the temporal expressions like "cold period", "2.65 million years ago", or "since Paleogene" (Paleogene: 66 million years ago ~ 23.03 million years ago). When I try to normalize those temporal expressions in the data, it showed me the following errors:

scala> import org.clulab.timenorm.scate._
import org.clulab.timenorm.scate._
scala> val parser = new TemporalNeuralParser
parser: org.clulab.timenorm.scate.TemporalNeuralParser = org.clulab.timenorm.scate.TemporalNeuralParser@35d1457d

scala> val anchor = SimpleInterval.of(2019, 11, 20)
anchor: org.clulab.timenorm.scate.SimpleInterval = SimpleInterval(2019-11-20T00:00,2019-11-21T00:00)

scala> val text = "We searched for correlations between timing in diversification and timing of (1) a period of marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago (Ma)"
text: String = We searched for correlations between timing in diversification and timing of (1) a period of marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago (Ma)

scala> for (timex <- parser.parse(text, anchor)) timex match{
     | case interval: Interval =>
     | val Some((charStart, charEnd)) = interval.charSpan
     | println(s"${interval.start} ${interval.end} ${text.substring(charStart, charEnd)}")
     | }
scala.MatchError: periods (of class java.lang.String)
  at org.clulab.timenorm.scate.AnaforaReader.period(Readers.scala:94)
  at org.clulab.timenorm.scate.AnaforaReader.temporal(Readers.scala:306)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1$$anonfun$apply$8.apply(TemporalNeuralParser.scala:154)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1$$anonfun$apply$8.apply(TemporalNeuralParser.scala:154)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.Iterator$class.foreach(Iterator.scala:750)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1.apply(TemporalNeuralParser.scala:154)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1.apply(TemporalNeuralParser.scala:151)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at org.clulab.timenorm.scate.TemporalNeuralParser.parseBatch(TemporalNeuralParser.scala:151)
  at org.clulab.timenorm.scate.TemporalNeuralParser.parse(TemporalNeuralParser.scala:137)
  ... 49 elided

or

scala> import org.clulab.timenorm.scate._
import org.clulab.timenorm.scate._

scala> val parser = new TemporalNeuralParser
parser: org.clulab.timenorm.scate.TemporalNeuralParser = org.clulab.timenorm.scate.TemporalNeuralParser@6ded3740

scala> val anchor = SimpleInterval.of(2019, 11, 20)
anchor: org.clulab.timenorm.scate.SimpleInterval = SimpleInterval(2019-11-20T00:00,2019-11-21T00:00)

scala> val text = "marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago"
text: String = marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago

scala> for (timex <- parser.parse(text, anchor)) timex match {
     | case interval: Interval =>
     | val Some((charStart, charEnd)) = interval.charSpan
     | println(s"${interval.start} ${interval.end} ${text.substring(charStart, charEnd)}")
     | }
2019-11-21 13:20:46.711084: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
scala.MatchError: FractionalNumber(37,1,2,Some((74,78))) (of class org.clulab.timenorm.scate.FractionalNumber)
  at $anonfun$1.apply(<console>:14)
  at $anonfun$1.apply(<console>:14)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  ... 49 elided

And for the different sentence, it showed following error messages:

scala> val text = "major volcanic activity at DSDP Site 216 on Ninetyeast Ridge (Kerguelen hotspot) where volcanic sediments above basement basalt are dated ~69.5 Ma (nannofossil zone UC20a, planktic foraminiferal zones CF5CF4) and continued for about 2 million years."
text: String = major volcanic activity at DSDP Site 216 on Ninetyeast Ridge (Kerguelen hotspot) where volcanic sediments above basement basalt are dated ~69.5 Ma (nannofossil zone UC20a, planktic foraminiferal zones CF5CF4) and continued for about 2 million years.

scala> for (timex <- parser.parse(text, anchor)) timex match {
     | case interval: Interval =>
     | val Some((charStart, charEnd)) = interval.charSpan
     | println(s"${interval.start} ${interval.end} ${text.substring(charStart, charEnd)}")
     | }
org.clulab.timenorm.scate.AnaforaReader$Exception: cannot parse RepeatingInterval from "Some(million)" and Vector(<entity>
          <id>2@id</id>
          <span>235,242</span>
          <type>Frequency</type>
          <properties>
            <Number>1@id</Number><Type>million</Type>
          </properties>
        </entity>, <entity>
          <id>1@id</id>
          <span>235,242</span>
          <type>Number</type>
          <properties>
            <Value>1000000</Value>
          </properties>
        </entity>)
  at org.clulab.timenorm.scate.AnaforaReader.repeatingInterval(Readers.scala:280)
  at org.clulab.timenorm.scate.AnaforaReader.temporal(Readers.scala:316)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1$$anonfun$apply$8.apply(TemporalNeuralParser.scala:154)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1$$anonfun$apply$8.apply(TemporalNeuralParser.scala:154)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.Iterator$class.foreach(Iterator.scala:750)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1.apply(TemporalNeuralParser.scala:154)
  at org.clulab.timenorm.scate.TemporalNeuralParser$$anonfun$parseBatch$1.apply(TemporalNeuralParser.scala:151)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at org.clulab.timenorm.scate.TemporalNeuralParser.parseBatch(TemporalNeuralParser.scala:151)
  at org.clulab.timenorm.scate.TemporalNeuralParser.parse(TemporalNeuralParser.scala:137)
  ... 49 elided

When I fed the sentence I created, it showed me the result with the error message. Even though it showed me the result, I think the result is different from what it should be.

scala> val text = "I haven't done it since 4.5 million years ago"
text: String = I haven't done it since 4.5 million years ago

scala> for (timex <- parser.parse(text, anchor)) timex match{
     | case interval: Interval =>
     | val Some((charStart, charEnd)) = interval.charSpan
     | println(s"${interval.start} ${interval.end} ${text.substring(charStart, charEnd)}")
     | }
2019-11-21 13:06:46.315482: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
-997981-05-22T00:00 2019-11-21T00:00 since 4.5 million years ago
scala.MatchError: FractionalNumber(4,1,2,Some((24,27))) (of class org.clulab.timenorm.scate.FractionalNumber)
  at $anonfun$1.apply(<console>:14)
  at $anonfun$1.apply(<console>:14)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  ... 49 elided

Do you have any thoughts on this type of issue?

Thanks,

@bethard
Copy link
Collaborator

bethard commented Nov 25, 2019

The first one is a bug in our handling of the Type of Period operations (it should be UNKNOWN, not periods:

scala> parser.parseToXML("We searched for correlations between timing in diversification and timing of (1) a period of marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago (Ma)")
res5: scala.xml.Elem =
<data>
        <annotations>
          <entity>
          <id>0@id</id>
          <span>83,89</span>
          <type>Period</type>
          <properties>
            <Type>periods</Type>
          </properties>
        </entity>...

The second is because we very little training data on fractional numbers, so the model does not handle them well. In this case, it actually produces a parse, but the parse is wrong because it doesn't connect the 37.5 to anything.:

scala> parser.parse("marked volcanism across the Trans-Mexican Volcanic Belt in central Mexico 37.5 million years ago", anchor)
res8: Array[org.clulab.timenorm.scate.TimeExpression] = Array(FractionalNumber(37,1,2,Some((74,78))), BeforeP(SimpleInterval(2019-11-20T00:00,2019-11-21T00:00),SimplePeriod(Years,IntNumber(1000000,Some((79,86))),None,Some((87,92))),Some((93,96))))

The third is another wrong parse, where it has identified "million" as both a Frequency and a Number and doesn't know how to link it up propery:

scala> print(parser.parseToXML("major volcanic activity at DSDP Site 216 on Ninetyeast Ridge (Kerguelen hotspot) where volcanic sediments above basement basalt are dated ~69.5 Ma (nannofossil zone UC20a, planktic foraminiferal zones CF5CF4) and continued for about 2 million years."))
<data>
        <annotations>
          <entity>
          <id>0@id</id>
          <span>233,234</span>
          <type>Number</type>
          <properties>
            <Value>2</Value>
          </properties>
        </entity><entity>
          <id>1@id</id>
          <span>235,242</span>
          <type>Number</type>
          <properties>
            <Value>1000000</Value>
          </properties>
        </entity><entity>
          <id>2@id</id>
          <span>235,242</span>
          <type>Frequency</type>
          <properties>
            <Number>1@id</Number><Type>million</Type>
          </properties>
        </entity><entity>
          <id>3@id</id>
          <span>243,248</span>
          <type>Period</type>
          <properties>
            <Number>0@id</Number><Type>Years</Type>
          </properties>
        </entity>
        </annotations>
      </data>

Your last example is again the problem that the parser doesn't handle fractional numbers well.

@EgoLaparra: Any thoughts on how to fix any of these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants