Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for spark version 3.5.0? #48

Open
khanjandharaiya opened this issue Feb 20, 2024 · 0 comments
Open

Support for spark version 3.5.0? #48

khanjandharaiya opened this issue Feb 20, 2024 · 0 comments

Comments

@khanjandharaiya
Copy link

khanjandharaiya commented Feb 20, 2024

Hey there! I am using latest version 1.3.0 of jpmml-evaluator-spark but after upgrading to the latest spark version 3.5.0. i am getting this error:

untyped Scala UDF

ERROR org.apache.spark.ml.util.Instrumentation - org.apache.spark.sql.AnalysisException: [UNTYPED_SCALA_UDF] You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`.
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive.
3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution.
	at org.apache.spark.sql.errors.QueryCompilationErrors$.usingUntypedScalaUDFError(QueryCompilationErrors.scala:3157)
	at org.apache.spark.sql.functions$.udf(functions.scala:8299)
	at org.jpmml.evaluator.spark.PMMLTransformer.transform(PMMLTransformer.scala:99)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$4(Pipeline.scala:311)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$3(Pipeline.scala:311)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:308)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
	at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:307)

After using "spark.sql.legacy.allowUntypedScalaUDF", "true" its working fine.

Is there will any update from your side to solve this?

I found this related closed issue: #43 for spark version 3.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant