Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's problem caused the stream program shut down ? #24

Open
zhengqiangtan opened this issue Jan 15, 2020 · 3 comments
Open

what's problem caused the stream program shut down ? #24

zhengqiangtan opened this issue Jan 15, 2020 · 3 comments

Comments

@zhengqiangtan
Copy link

hi ,I'm sorry to bothered you, I met some issue like this , I can not find the reason why the stream program shutdown , This is the detail error log .

2020-01-14 16:55:00 ERROR MicroBatchExecution:91 - Query [id = 1e58880f-be83-47c3-b837-9d7c64a90982, runId = 9eecfa85-29c5-4246-963e-90fd09f6731c] terminated with error
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.sql.mlsql.sources.mysql.binlog.BinLogSocketServerSerDer$class.readResponse(servers.scala:163)
at org.apache.spark.sql.mlsql.sources.MLSQLBinLogSource.readResponse(MLSQLBinLogDataSource.scala:243)
at org.apache.spark.sql.mlsql.sources.MLSQLBinLogSource.getLatestOffset(MLSQLBinLogDataSource.scala:324)
at org.apache.spark.sql.mlsql.sources.MLSQLBinLogSource.getOffset(MLSQLBinLogDataSource.scala:334)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$9.apply(MicroBatchExecution.scala:345)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$9.apply(MicroBatchExecution.scala:345)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.apply(MicroBatchExecution.scala:344)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.apply(MicroBatchExecution.scala:341)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:341)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:554)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:337)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:183)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
2020-01-14 16:55:00 INFO DAGScheduler:54 - Asked to cancel job group 8913a720-df1b-427b-abfc-18cae73935dc
2020-01-14 16:55:00 INFO YarnScheduler:54 - Cancelling stage 3
2020-01-14 16:55:00 INFO YarnScheduler:54 - Killing all running tasks in stage 3: Stage cancelled
2020-01-14 16:55:00 INFO YarnScheduler:54 - Stage 3 was cancelled
2020-01-14 16:55:00 INFO DAGScheduler:54 - ResultStage 3 (start at NativeMethodAccessorImpl.java:0) failed in 86450.992 s due to Job 1 cancelled part of cancelled job group 8913a720-df1b-427b-abfc-18cae73935dc
2020-01-14 16:55:00 INFO DAGScheduler:54 - Job 1 failed: start at NativeMethodAccessorImpl.java:0, took 86450.997796 s
Exception in thread "launch-binlog-socket-server-in-spark-job" org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 8913a720-df1b-427b-abfc-18cae73935dc
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1822)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:906)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:906)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:906)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:906)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2077)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at org.apache.spark.sql.mlsql.sources.MLSQLBinLogDataSource.org$apache$spark$sql$mlsql$sources$MLSQLBinLogDataSource$$launchBinlogServer$1(MLSQLBinLogDataSource.scala:207)
at org.apache.spark.sql.mlsql.sources.MLSQLBinLogDataSource$$anon$4.run(MLSQLBinLogDataSource.scala:214)
Traceback (most recent call last):
File "/data/sync/ods_users_df/stream2hdfs.py", line 36, in
query.awaitTermination()
File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/streaming.py", line 103, in awaitTermination
File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 75, in deco

@allwefantasy
Copy link
Owner

I guess the executor who runs binlog server may be killed by yarn. Try to increase the spark executor overhead memory.

@zhengqiangtan
Copy link
Author

image

@zhengqiangtan
Copy link
Author

2020-01-15 12:09:26 ERROR MLSQLBinLogDataSource:91 - next on empty iterator
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 2 on zmbd-pm07: Executor heartbeat timed out after 155288 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 5 on zmbd-pm13: Executor heartbeat timed out after 152209 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 8 on zmbd-pm11: Executor heartbeat timed out after 152101 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 7 on zmbd-pm07: Executor heartbeat timed out after 150332 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 1 on zmbd-pm07: Executor heartbeat timed out after 156505 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 4 on zmbd-pm13: Executor heartbeat timed out after 147583 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 6 on zmbd-pm16: Executor heartbeat timed out after 150414 ms
2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 3 on zmbd-pm07: Executor heartbeat timed out after 150221 ms
2020-01-16 01:33:49 ERROR TransportChannelHandler:144 - Connection to /172.20.1.10:36940 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.
2020-01-16 01:33:50 ERROR TransportResponseHandler:144 - Still have 1 requests outstanding when connection from /172.20.1.10:36940 is closed
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(2)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 8516244618252785373 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(5)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 5604667397559748996 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(8)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 4795019310054221062 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(7)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 5234086875584424262 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(1)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 5740243668631548947 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(4)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 7509468406493835515 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(6)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 4790295982241523280 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(3)) to AM was unsuccessful
2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:38 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:38 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:45:34 ERROR MicroBatchExecution:91 - Query [id = c4492e8a-a982-4774-a19b-6aa5a2ac9c9b, runId = 5aa1e3bd-2f26-4e96-bd26-c88d0efa9176] terminated with error
2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception
2020-01-16 01:45:34 ERROR TransportClient:331 - Failed to send RPC RPC 6764822998199901153 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException
2020-01-16 01:45:34 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
2020-01-16 01:45:34 ERROR Utils:91 - Uncaught exception in thread Thread-1
2020-01-16 01:45:34 ERROR Utils:91 - Uncaught exception in thread Thread-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants