Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Error in Executor setup After Permanent UDF Deletion #6812

Open
3 of 4 tasks
hzxiongyinke opened this issue Nov 18, 2024 · 3 comments
Open
3 of 4 tasks

[Bug] Error in Executor setup After Permanent UDF Deletion #6812

hzxiongyinke opened this issue Nov 18, 2024 · 3 comments
Labels
kind:bug This is a clearly a bug priority:major

Comments

@hzxiongyinke
Copy link
Contributor

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

Hello everyone,

I've encountered an issue with Kyuubi that I'm hoping the community can help with.

I created a permanent UDF in a Kyuubi instance, and later, due to requirement changes, I deleted this UDF through another driver. However, any SQL executed by the previously initiated driver now results in an error, indicating that the UDF cannot be found. Currently, the only solution I have is to restart the Kyuubi engine.

Affects Version(s)

1.8.0

Kyuubi Server Log Output

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 279.0 failed 4 times, most recent failure: Lost task 0.3 in stage 279.0 (TID 251) (core-xxxx.cn-shanghai.emr.aliyuncs.com executor 29): java.io.FileNotFoundException:  [ErrorMessage]: File not found: .GalaxyResource/bigdata_emr_sh/xxx in bucket xxx
        at com.aliyun.jindodata.api.spec.JdoNativeResult.get(JdoNativeResult.java:54)
        at com.aliyun.jindodata.api.spec.protos.coder.JdolistDirectoryReplyDecoder.decode(JdolistDirectoryReplyDecoder.java:23)
        at com.aliyun.jindodata.api.JindoCommonApis.listDirectory(JindoCommonApis.java:112)
        at com.aliyun.jindodata.call.JindoListCall.execute(JindoListCall.java:65)
        at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:665)
        at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:60)
        at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:851)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:820)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:544)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:1010)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:1002)
        at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
        at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
        at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:1002)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:506)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2673)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2609)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2608)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2608)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2861)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2803)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2792)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:269)

Kyuubi Engine Log Output

spark executor log error:
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Running task 0.0 in stage 283.0 (TID 264)
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Fetching oss://xxx/xxxx/xxx with timestamp 1731900662592
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] HadoopLoginUserInfo: TOKEN: YARN_AM_RM_TOKEN
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] HadoopLoginUserInfo: User: xxxx, authMethod: SIMPLE, ugi: xxxx (auth:SIMPLE)
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] JindoHadoopSystem: Initialized native file system: 
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] FsStats: cmd=getFileStatus, src=oss://xxxx/.xxxx/xxx/xxxx, dst=null, size=0, parameter=null, time-in-ms=77, version=6.2.0
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] FsStats: cmd=list, src=oss://xxxx/.xxxx/xxxx/xxxx, dst=null, size=0, parameter=null, time-in-ms=26, version=6.2.0
24/11/18 16:06:18 ERROR [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Exception in task 0.0 in stage 283.0 (TID 264)
java.io.FileNotFoundException:  [ErrorMessage]: File not found: .xxxx/xxxx/xxxx in bucket xxxx
	at com.aliyun.jindodata.api.spec.JdoNativeResult.get(JdoNativeResult.java:54) ~[jindo-core-6.2.0.jar:?]
	at com.aliyun.jindodata.api.spec.protos.coder.JdolistDirectoryReplyDecoder.decode(JdolistDirectoryReplyDecoder.java:23) ~[jindo-core-6.2.0.jar:?]
	at com.aliyun.jindodata.api.JindoCommonApis.listDirectory(JindoCommonApis.java:112) ~[jindo-core-6.2.0.jar:?]
	at com.aliyun.jindodata.call.JindoListCall.execute(JindoListCall.java:65) ~[jindo-sdk-6.2.0.jar:?]
	at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:665) ~[jindo-sdk-6.2.0.jar:?]
	at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:60) ~[jindo-sdk-6.2.0.jar:?]
	at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:851) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:820) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.util.Utils$.fetchFile(Utils.scala:544) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:1010) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:1002) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) ~[scala-library-2.12.15.jar:?]
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:1002) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:506) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_392]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_392]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_392]
24/11/18 16:06:18 INFO [dispatcher-Executor] YarnCoarseGrainedExecutorBackend: Got assigned task 265

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@hzxiongyinke hzxiongyinke added kind:bug This is a clearly a bug priority:major labels Nov 18, 2024
Copy link

Hello @hzxiongyinke,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

@hzxiongyinke
Copy link
Contributor Author

cc @yaooqinn @pan3793

@yaooqinn
Copy link
Member

The failed Spark application didn't even access the missing jar file, did it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

2 participants