Support for spark version 3.5.0? #48

khanjandharaiya · 2024-02-20T09:22:19Z

Hey there! I am using latest version 1.3.0 of jpmml-evaluator-spark but after upgrading to the latest spark version 3.5.0. i am getting this error:

untyped Scala UDF

ERROR org.apache.spark.ml.util.Instrumentation - org.apache.spark.sql.AnalysisException: [UNTYPED_SCALA_UDF] You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`.
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive.
3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution.
	at org.apache.spark.sql.errors.QueryCompilationErrors$.usingUntypedScalaUDFError(QueryCompilationErrors.scala:3157)
	at org.apache.spark.sql.functions$.udf(functions.scala:8299)
	at org.jpmml.evaluator.spark.PMMLTransformer.transform(PMMLTransformer.scala:99)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$4(Pipeline.scala:311)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$3(Pipeline.scala:311)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:308)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
	at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:307)

After using "spark.sql.legacy.allowUntypedScalaUDF", "true" its working fine.

Is there will any update from your side to solve this?

I found this related closed issue: #43 for spark version 3.1.1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for spark version 3.5.0? #48

Support for spark version 3.5.0? #48

khanjandharaiya commented Feb 20, 2024 •

edited by vruusmann

Loading

Support for spark version 3.5.0? #48

Support for spark version 3.5.0? #48

Comments

khanjandharaiya commented Feb 20, 2024 • edited by vruusmann Loading

khanjandharaiya commented Feb 20, 2024 •

edited by vruusmann

Loading