You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm building an POC for sales forecast and getting an exception that I can't get rid of.
org.apache.commons.math3.exception.TooManyEvaluationsException: illegal state: maximal count (100.000) exceeded: evaluations
at org.apache.commons.math3.optim.BaseOptimizer$MaxEvalCallback.trigger(BaseOptimizer.java:242)
at org.apache.commons.math3.util.Incrementor.incrementCount(Incrementor.java:155)
at org.apache.commons.math3.optim.BaseOptimizer.incrementEvaluationCount(BaseOptimizer.java:191)
at org.apache.commons.math3.optim.nonlinear.scalar.MultivariateOptimizer.computeObjectiveValue(MultivariateOptimizer.java:114)
at org.apache.commons.math3.optim.nonlinear.scalar.LineSearch$1.value(LineSearch.java:120)
at org.apache.commons.math3.optim.univariate.UnivariateOptimizer.computeObjectiveValue(UnivariateOptimizer.java:149)
at org.apache.commons.math3.optim.univariate.BrentOptimizer.doOptimize(BrentOptimizer.java:225)
at org.apache.commons.math3.optim.univariate.BrentOptimizer.doOptimize(BrentOptimizer.java:43)
at org.apache.commons.math3.optim.BaseOptimizer.optimize(BaseOptimizer.java:153)
at org.apache.commons.math3.optim.univariate.UnivariateOptimizer.optimize(UnivariateOptimizer.java:70)
at org.apache.commons.math3.optim.nonlinear.scalar.LineSearch.search(LineSearch.java:130)
at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.doOptimize(NonLinearConjugateGradientOptimizer.java:282)
at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.doOptimize(NonLinearConjugateGradientOptimizer.java:46)
at org.apache.commons.math3.optim.BaseOptimizer.optimize(BaseOptimizer.java:153)
at org.apache.commons.math3.optim.BaseMultivariateOptimizer.optimize(BaseMultivariateOptimizer.java:65)
at org.apache.commons.math3.optim.nonlinear.scalar.MultivariateOptimizer.optimize(MultivariateOptimizer.java:63)
at org.apache.commons.math3.optim.nonlinear.scalar.GradientMultivariateOptimizer.optimize(GradientMultivariateOptimizer.java:73)
at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.optimize(NonLinearConjugateGradientOptimizer.java:244)
at com.cloudera.sparkts.models.ARIMA$.fitWithCSSCGD(ARIMA.scala:198)
at com.cloudera.sparkts.models.ARIMA$.fitModel(ARIMA.scala:107)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2$$anonfun$apply$2.apply(SalesForecastTrainerPOC.scala:94)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2$$anonfun$apply$2.apply(SalesForecastTrainerPOC.scala:90)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2.apply(SalesForecastTrainerPOC.scala:90)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2.apply(SalesForecastTrainerPOC.scala:89)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It seems that this is an concurrency problem. If I collect the RDDs data to the driver and run the model training in sequence via
salesTimeSeriesRDD collect() foreach {
case (key, tsVector) =>valmodel=ARIMA.fitModel(5, 0, 1, tsVector)
valforeacasts= model.forecast(Vectors.dense(Array.emptyDoubleArray), 40)
log.info(s"foreacasts for $key[${foreacasts.size}]: $foreacasts")
}
everything works fine.
The problem is that the code dives really quick into mathematic stuff via apaches math libs.
To Be honest: I'm no expert in ML/linear Algebra, but are familiar with spark.
I just wanted to use this lib to implement a proof of concept for timeseries forecasting.
I'm building an POC for sales forecast and getting an exception that I can't get rid of.
TimeSeriesRDD will be built via
If the DataFrame contains for one key only everything works fine.
Series Data looks like that and as of now contains 251 values per key
When eliminating the NaN's models for some keys can be generated but forecasts crashes some times
The text was updated successfully, but these errors were encountered: