Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TooManyEvaluationsException #165

Open
rolandjohann opened this issue Sep 13, 2016 · 2 comments
Open

TooManyEvaluationsException #165

rolandjohann opened this issue Sep 13, 2016 · 2 comments

Comments

@rolandjohann
Copy link

rolandjohann commented Sep 13, 2016

I'm building an POC for sales forecast and getting an exception that I can't get rid of.

org.apache.commons.math3.exception.TooManyEvaluationsException: illegal state: maximal count (100.000) exceeded: evaluations
    at org.apache.commons.math3.optim.BaseOptimizer$MaxEvalCallback.trigger(BaseOptimizer.java:242)
    at org.apache.commons.math3.util.Incrementor.incrementCount(Incrementor.java:155)
    at org.apache.commons.math3.optim.BaseOptimizer.incrementEvaluationCount(BaseOptimizer.java:191)
    at org.apache.commons.math3.optim.nonlinear.scalar.MultivariateOptimizer.computeObjectiveValue(MultivariateOptimizer.java:114)
    at org.apache.commons.math3.optim.nonlinear.scalar.LineSearch$1.value(LineSearch.java:120)
    at org.apache.commons.math3.optim.univariate.UnivariateOptimizer.computeObjectiveValue(UnivariateOptimizer.java:149)
    at org.apache.commons.math3.optim.univariate.BrentOptimizer.doOptimize(BrentOptimizer.java:225)
    at org.apache.commons.math3.optim.univariate.BrentOptimizer.doOptimize(BrentOptimizer.java:43)
    at org.apache.commons.math3.optim.BaseOptimizer.optimize(BaseOptimizer.java:153)
    at org.apache.commons.math3.optim.univariate.UnivariateOptimizer.optimize(UnivariateOptimizer.java:70)
    at org.apache.commons.math3.optim.nonlinear.scalar.LineSearch.search(LineSearch.java:130)
    at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.doOptimize(NonLinearConjugateGradientOptimizer.java:282)
    at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.doOptimize(NonLinearConjugateGradientOptimizer.java:46)
    at org.apache.commons.math3.optim.BaseOptimizer.optimize(BaseOptimizer.java:153)
    at org.apache.commons.math3.optim.BaseMultivariateOptimizer.optimize(BaseMultivariateOptimizer.java:65)
    at org.apache.commons.math3.optim.nonlinear.scalar.MultivariateOptimizer.optimize(MultivariateOptimizer.java:63)
    at org.apache.commons.math3.optim.nonlinear.scalar.GradientMultivariateOptimizer.optimize(GradientMultivariateOptimizer.java:73)
    at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.optimize(NonLinearConjugateGradientOptimizer.java:244)
    at com.cloudera.sparkts.models.ARIMA$.fitWithCSSCGD(ARIMA.scala:198)
    at com.cloudera.sparkts.models.ARIMA$.fitModel(ARIMA.scala:107)
    at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2$$anonfun$apply$2.apply(SalesForecastTrainerPOC.scala:94)
    at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2$$anonfun$apply$2.apply(SalesForecastTrainerPOC.scala:90)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2.apply(SalesForecastTrainerPOC.scala:90)
    at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2.apply(SalesForecastTrainerPOC.scala:89)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

TimeSeriesRDD will be built via

val salesTimeSeriesRDD = TimeSeriesRDD.timeSeriesRDDFromObservations(dateTimeIndex, salesDF, "soldAt", "productId", "quantity")

val keyWithModelRDD = salesTimeSeriesRDD map {
  case (key, tsVector) => (key, ARIMA.fitModel(1, 0, 1, tsVector)) // exception
}

If the DataFrame contains for one key only everything works fine.

Series Data looks like that and as of now contains 251 values per key

[NaN,337.0,27.0,242.0,226.0,142.0,252.0,215.0,280.0,1.0,437.0,338.0,403.0,840.0,723.0,1129.0,768.0,208.0,177.0,238.0,275.0,307.0,13.0,201.0,383.0,220.0,230.0,303.0,476.0,9.0,655.0,424.0,414.0,414.0,319.0,330.0,1.0,202.0,127.0,118.0,135.0,167.0,342.0,5.0,256.0,204.0,188.0,189.0,249.0,358.0,NaN,165.0,105.0,83.0,106.0,141.0,229.0,1.0,171.0,109.0,85.0,131.0,176.0,319.0,27.0,172.0,168.0,152.0,136.0,161.0,274.0,25.0,166.0,146.0,155.0,321.0,366.0,436.0,16.0,368.0,244.0,242.0,200.0,0.0,296.0,0.0,157.0,188.0,146.0,202.0,174.0,15.0,131.0,158.0,164.0,181.0,199.0,262.0,20.0,196.0,152.0,137.0,122.0,177.0,305.0,6.0,498.0,159.0,119.0,127.0,144.0,240.0,6.0,153.0,108.0,100.0,105.0,134.0,172.0,146.0,209.0,157.0,0.0,271.0,277.0,12.0,275.0,178.0,187.0,222.0,291.0,356.0,182.0,102.0,117.0,152.0,185.0,0.0,474.0,549.0,578.0,226.0,695.0,547.0,9.0,386.0,315.0,278.0,253.0,315.0,328.0,1.0,61.0,34.0,41.0,93.0,118.0,195.0,NaN,191.0,144.0,106.0,113.0,151.0,272.0,1.0,142.0,103.0,103.0,263.0,161.0,263.0,NaN,258.0,283.0,301.0,390.0,388.0,588.0,5.0,440.0,399.0,348.0,267.0,310.0,443.0,1.0,310.0,190.0,218.0,274.0,343.0,409.0,0.0,245.0,88.0,139.0,146.0,178.0,244.0,NaN,191.0,154.0,124.0,135.0,148.0,189.0,1.0,216.0,234.0,210.0,216.0,262.0,315.0,0.0,258.0,252.0,183.0,232.0,264.0,425.0,NaN,366.0,370.0,358.0,374.0,355.0,547.0,0.0,561.0,437.0,339.0,323.0,360.0,483.0,3.0,452.0,330.0,354.0,148.0,140.0,192.0,9.0,220.0,166.0,214.0,184.0,213.0,329.0,2.0,236.0]

When eliminating the NaN's models for some keys can be generated but forecasts crashes some times

@rolandjohann
Copy link
Author

rolandjohann commented Sep 23, 2016

It seems that this is an concurrency problem. If I collect the RDDs data to the driver and run the model training in sequence via

salesTimeSeriesRDD collect() foreach {
  case (key, tsVector) =>
    val model = ARIMA.fitModel(5, 0, 1, tsVector)
    val foreacasts = model.forecast(Vectors.dense(Array.emptyDoubleArray), 40)
    log.info(s"foreacasts for $key[${foreacasts.size}]: $foreacasts")
}

everything works fine.
The problem is that the code dives really quick into mathematic stuff via apaches math libs.
To Be honest: I'm no expert in ML/linear Algebra, but are familiar with spark.
I just wanted to use this lib to implement a proof of concept for timeseries forecasting.

@winterny
Copy link

winterny commented Dec 5, 2018

How did u solve this problem ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants