Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No messages received for 16 minutes following ServiceBus TimeoutException #395

Open
pluckhurst opened this issue Jan 9, 2020 · 3 comments
Assignees

Comments

@pluckhurst
Copy link

Actual Behavior

Summary of problem
Some UpdateState requests fail with TimeoutException and we then don't receive any messages from the queue for 16 minutes when the connection times out and reconnects. If this is expected behaviour is there a way we can configure a much shorter timeout?

Details with lots of log output
We process messages in a loop until we we stop our process or we get an exception. We have set prefetch=10 and if the receive() call returns no messages we sleep for 2 seconds. Strip away all the error handling, etc and the loop is effectively calling two IMessageReceiver methods.

message = messageReceiver.receive(Duration.ofSeconds(10));
    // process the message here
messageReceiver.completeAsync(lockToken);

Most of the time this works without a restart for weeks at a time but occasionally we get a 16 minute period (it's always 16 minutes) where we receive no messages or failures from the above methods. After the 16 minute delay we start getting messages again and immediately process the backlog of messages so we know messages are being published during this 16 minute periods. Recently we have started getting more of these delays where it can be as often as once a day.

During the 16 minute period the calls to messageReceiver.receive() continue to work and the log messages are the same as when there are no messages on the queue.

The first sign of problem are 8 exceptions indicating timeouts trying to update the message state. When azure recovers 16 minutes later these are the first message we receive. Each of the exceptions are the same, e.g.

2020-01-08 12:13:44,985 ERROR [pool-24-thread-3] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: UpdateState request timed out. Delivery:!�L۬��J�BB�v���
com.microsoft.azure.servicebus.primitives.TimeoutException: Request timed out.
	at com.microsoft.azure.servicebus.primitives.CoreMessageReceiver$1.run(CoreMessageReceiver.java:164)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

For the next 11 minutes all we see logged are the 'No messages received' messages every 12 seconds.

2020-01-08 12:14:25,101 INFO  [pool-24-thread-2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: No messages received from 'tcc_events'. Pending receive request timed out. Returning null to the client.

Then after 11 minutes we see the following request.

2020-01-08 12:24:27,803 DEBUG [pool-24-thread-4] com.microsoft.azure.servicebus.primitives.MessagingFactory: Sending token for amqp://encyclev2.servicebus.windows.net/tcc_events
2020-01-08 12:24:27,803 DEBUG [pool-23-thread-35928] com.microsoft.azure.servicebus.primitives.SASUtil: Generated SAS token for resource: amqp://encyclev2.servicebus.windows.net/tcc_events with sas key name : PartnerPolicy
2020-01-08 12:24:27,803 DEBUG [pool-23-thread-35928] com.microsoft.azure.servicebus.primitives.CommonRequestResponseOperations: Sending CBS Token of type 'servicebus.windows.net:sastoken' to 'amqp://encyclev2.servicebus.windows.net/tcc_events'
2020-01-08 12:24:27,803 DEBUG [pool-23-thread-35928] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Sending request with id:request:2768
2020-01-08 12:24:27,803 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Starting requestResponseLink $cbs internal sender send loop
2020-01-08 12:24:27,803 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResonseLink $cbs internal sender sent a request. available credit :186
2020-01-08 12:24:27,803 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink $cbs internal sender send loop ending as there are no more requests enqueued.
2020-01-08 12:24:27,803 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink $cbs internal sender send loop stopped.

And 10 seconds later we get this timed out exception.

2020-01-08 12:24:37,803 WARN  [pool-24-thread-3] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Request with id:request:2768 timed out
2020-01-08 12:24:37,806 WARN  [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.MessagingFactory: Sending CBS Token for amqp://encyclev2.servicebus.windows.net/tcc_events failed.
com.microsoft.azure.servicebus.primitives.TimeoutException: Request timed out.
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$5.run(RequestResponseLink.java:469)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2020-01-08 12:24:37,806 INFO  [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.MessagingFactory: Will retry sending CBS Token for amqp://encyclev2.servicebus.windows.net/tcc_events after 5 seconds.

These requests and exceptions then continue for another 5 minutes until the connection finally times out and we see the following logged.

2020-01-08 12:29:34,452 WARN  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ConnectionHandler: Connection.onTransportError: hostname:encyclev2.servicebus.windows.net:5671, error:Connection timed out
2020-01-08 12:29:34,452 ERROR [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.MessagingFactory: Connection error. 'Error{condition=proton:io, description='Connection timed out', info=null}'
2020-01-08 12:29:34,452 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.MessagingFactory: Closing all links on the connection. Number of links '3'
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.MessagingFactory: Closed all links on the connection. Number of links '3'
2020-01-08 12:29:34,453 INFO  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.MessagingFactory: Closing connection to host
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.BaseLinkHandler: linkName:RequestResponseLink-Sender_154a33, ErrorCondition:proton:io, Connection timed out
2020-01-08 12:29:34,453 WARN  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Internal send link 'RequestResponseLink-Sender_154a33' of requestresponselink to '$cbs' encountered error.
com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=proton:io, description='Connection timed out', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:121)
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalSender.onClose(RequestResponseLink.java:881)
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:79)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:385)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:310)
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:147)
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191)
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:493)
	at java.lang.Thread.run(Thread.java:745)
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.BaseLinkHandler: linkName:RequestResponseLink-Receiver_485c74, ErrorCondition:proton:io, Connection timed out
2020-01-08 12:29:34,453 WARN  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Internal receive link 'RequestResponseLink-Receiver_485c74' of requestresponselink to '$cbs' encountered error.
com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=proton:io, description='Connection timed out', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:121)
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalReceiver.onClose(RequestResponseLink.java:698)
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:79)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:385)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:310)
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:147)
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191)
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:493)
	at java.lang.Thread.run(Thread.java:745)
2020-01-08 12:29:34,453 WARN  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Receive link 'Receiver_4ede0c_2d51b6f137034160a16ed140385ec399_G25' to 'tcc_events', sessionId 'null' closed with error.
com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=proton:io, description='Connection timed out', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:121)
	at com.microsoft.azure.servicebus.primitives.CoreMessageReceiver.onClose(CoreMessageReceiver.java:973)
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:79)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:385)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:310)
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:147)
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191)
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:493)
	at java.lang.Thread.run(Thread.java:745)
2020-01-08 12:29:34,453 INFO  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Receive link 'Receiver_4ede0c_2d51b6f137034160a16ed140385ec399_G25' to 'tcc_events', sessionId 'null' will be reopened after 'PT1S'
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.BaseLinkHandler: local link close. linkName:RequestResponseLink-Sender_154a33
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.BaseLinkHandler: local link close. linkName:RequestResponseLink-Receiver_485c74
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.BaseLinkHandler: local link close. linkName:Receiver_4ede0c_2d51b6f137034160a16ed140385ec399_G25
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ConnectionHandler: onConnectionLocalClose: hostname:encyclev2.servicebus.windows.net:5671
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SessionHandler: onSessionLocalClose - entityName: $cbs, condition: Error{condition=null, description='null', info=null}
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SessionHandler: onSessionLocalClose - entityName: $cbs, condition: Error{condition=null, description='null', info=null}
2020-01-08 12:29:34,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SessionHandler: onSessionLocalClose - entityName: tcc_events, condition: Error{condition=null, description='null', info=null}

And sure enough 1 second later it reconnects and we start receiving messages again.

2020-01-08 12:29:35,453 INFO  [pool-24-thread-3] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink - recreating internal send and receive links to $cbs
2020-01-08 12:29:35,453 INFO  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.MessagingFactory: Creating connection to host 'encyclev2.servicebus.windows.net:5671'
2020-01-08 12:29:35,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestReponseLink - opening send link to $cbs
2020-01-08 12:29:35,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestReponseLink - opening receive link to $cbs
2020-01-08 12:29:35,453 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ConnectionHandler: onConnectionInit: hostname:encyclev2.servicebus.windows.net:5671
2020-01-08 12:29:35,454 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ReceiveLinkHandler: onLinkLocalOpen: linkName:RequestResponseLink-Receiver_10563a, localSource:Source{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}
2020-01-08 12:29:35,454 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ConnectionHandler: onConnectionBound: hostname:encyclev2.servicebus.windows.net:5671
2020-01-08 12:29:35,464 INFO  [pool-24-thread-4] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Recreating receive link to 'tcc_events'
2020-01-08 12:29:35,464 DEBUG [pool-24-thread-4] com.microsoft.azure.servicebus.primitives.MessagingFactory: Sending token for amqp://encyclev2.servicebus.windows.net/tcc_events
2020-01-08 12:29:35,464 DEBUG [pool-23-thread-35947] com.microsoft.azure.servicebus.primitives.SASUtil: Generated SAS token for resource: amqp://encyclev2.servicebus.windows.net/tcc_events with sas key name : PartnerPolicy
2020-01-08 12:29:35,465 DEBUG [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.CommonRequestResponseOperations: Sending CBS Token of type 'servicebus.windows.net:sastoken' to 'amqp://encyclev2.servicebus.windows.net/tcc_events'
2020-01-08 12:29:35,465 DEBUG [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Sending request with id:request:2817
2020-01-08 12:29:35,504 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Starting requestResponseLink $cbs internal sender send loop
2020-01-08 12:29:35,504 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink $cbs internal sender send loop stopped.
2020-01-08 12:29:35,688 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ConnectionHandler: Connection.onConnectionRemoteOpen: hostname:encyclev2.servicebus.windows.net:5671, remotecontainer:2a7e5de4c42b47f68d13cc89fa15825e_G29
2020-01-08 12:29:35,688 INFO  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.MessagingFactory: Connection opened to host.
2020-01-08 12:29:35,688 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SessionHandler: onSessionRemoteOpen - entityName: $cbs, sessionIncCapacity: 0, sessionOutgoingWindow: 2147483647
2020-01-08 12:29:35,688 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SessionHandler: onSessionRemoteOpen - entityName: $cbs, sessionIncCapacity: 0, sessionOutgoingWindow: 2147483647
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SendLinkHandler: onLinkRemoteOpen: linkName:RequestResponseLink-Sender_ea3dd9, remoteTarge:Target{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, capabilities=null}
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Opened internal send link of requestresponselink to $cbs
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Starting requestResponseLink $cbs internal sender send loop
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink $cbs internal sender send loop stopped.
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResonseLink $cbs internal sender received credit :100
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResonseLink $cbs internal sender available credit :100
2020-01-08 12:29:35,689 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Starting requestResponseLink $cbs internal sender send loop
2020-01-08 12:29:35,690 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResonseLink $cbs internal sender sent a request. available credit :99
2020-01-08 12:29:35,690 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink $cbs internal sender send loop ending as there are no more requests enqueued.
2020-01-08 12:29:35,690 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestResponseLink $cbs internal sender send loop stopped.
2020-01-08 12:29:35,690 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SendLinkHandler: onLinkFlow: linkName:RequestResponseLink-Sender_ea3dd9, unsettled:0, credit:99
2020-01-08 12:29:35,690 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ReceiveLinkHandler: onLinkRemoteOpen: linkName:RequestResponseLink-Receiver_10563a, remoteSource:Source{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}
2020-01-08 12:29:35,690 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Opened internal receive link of requestresponselink to $cbs
2020-01-08 12:29:35,690 INFO  [pool-23-thread-35947] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Recreated internal links to $cbs
2020-01-08 12:29:35,717 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ReceiveLinkHandler: onDelivery: linkName:RequestResponseLink-Receiver_10563a, updatedLinkCredit:2147483647, remoteCredit:2147483646, remoteCondition:Error{condition=null, description='null', info=null}, delivery.isPartial:false
2020-01-08 12:29:35,718 DEBUG [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.RequestResponseLink: RequestRespnseLink received response for request with id :request:2817
2020-01-08 12:29:35,718 DEBUG [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Response for request with id:request:2817 has status code:202
2020-01-08 12:29:35,718 DEBUG [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.RequestResponseLink: Completing request with id:request:2817
2020-01-08 12:29:35,718 DEBUG [pool-23-thread-35947] com.microsoft.azure.servicebus.primitives.CommonRequestResponseOperations: CBS Token of type 'servicebus.windows.net:sastoken' sent to 'amqp://encyclev2.servicebus.windows.net/tcc_events'
2020-01-08 12:29:35,718 DEBUG [pool-23-thread-35943] com.microsoft.azure.servicebus.primitives.MessagingFactory: Sent CBS token for amqp://encyclev2.servicebus.windows.net/tcc_events and setting renew timer
2020-01-08 12:29:35,718 INFO  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Creating receive link to 'tcc_events'
2020-01-08 12:29:35,718 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Receive link settle mode 'sender settle mode: UNSETTLED, receiver settle mode: SECOND'
2020-01-08 12:29:35,718 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ReceiveLinkHandler: onLinkLocalOpen: linkName:Receiver_5e925d_2a7e5de4c42b47f68d13cc89fa15825e_G29, localSource:Source{address='tcc_events', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}
2020-01-08 12:29:35,745 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.SessionHandler: onSessionRemoteOpen - entityName: tcc_events, sessionIncCapacity: 2147483647, sessionOutgoingWindow: 2147483647
2020-01-08 12:29:35,748 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ReceiveLinkHandler: onLinkRemoteOpen: linkName:Receiver_5e925d_2a7e5de4c42b47f68d13cc89fa15825e_G29, remoteSource:Source{address='tcc_events', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}
2020-01-08 12:29:35,748 INFO  [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Receive link to 'tcc_events' opened.
2020-01-08 12:29:35,748 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Sent flow to the service. receiverPath:tcc_events, linkname:Receiver_5e925d_2a7e5de4c42b47f68d13cc89fa15825e_G29, updated-link-credit:10, sentCredits:10
2020-01-08 12:29:35,748 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: receiverPath:tcc_events, linkname:Receiver_5e925d_2a7e5de4c42b47f68d13cc89fa15825e_G29, updated-link-credit:10, sentCredits:10
2020-01-08 12:29:35,782 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.amqp.ReceiveLinkHandler: onDelivery: linkName:Receiver_5e925d_2a7e5de4c42b47f68d13cc89fa15825e_G29, updatedLinkCredit:10, remoteCredit:9, remoteCondition:Error{condition=null, description='null', info=null}, delivery.isPartial:false
2020-01-08 12:29:35,782 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Received a delivery '����O�_��m��' from 'tcc_events'
2020-01-08 12:29:35,782 DEBUG [ReactorThread6d188848-df49-4eb1-a589-b4f96aa1f2b2] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Received a message from 'tcc_events'. Adding to prefecthed messages.
2020-01-08 12:29:35,783 DEBUG [pool-24-thread-4] com.microsoft.azure.servicebus.primitives.CoreMessageReceiver: Returning the message received from 'tcc_events' to a pending receive request

Expected Behavior

  1. Because all the exception are handled inside azure client code our processing code has no idea there is a problem. We only know from the log output and from our message metrics we see a 16-minute period with no messages followed by a spike in number when it reconnects.
  2. Is there any setting or timeout we can set to force the azure client to reconnect quicker when it starts getting time out exceptions?
  3. Is it possible to tell if this a server-side or client-side problem. We have seen it from two client processes that connect to different service bus endpoints. One of the client applications runs from our development site and the other is running in AWS.

Versions

  • OS platform and version: Ubuntu 14.04.4 LTS
  • Maven package version or commit ID: 1.2.15
  • Java: 1.8.0_51
@yvgopal yvgopal self-assigned this Jan 29, 2020
@kl298p
Copy link

kl298p commented Jun 4, 2020

Has anybody figured out the issue yet? I am working on a project that seems to have this exact same issue. To me if feels like this is an issue between the Azure IaaS VM Virtual NIC => outbound; we have performed multiple tcpdumps that seem to indicate this. The very random 16 minute requests seem to go out, but nothing comes back until 16 minutes later. This happens maybe 1% of the time, so is unpredictable other than we know it will happen at least a time or two during the day - but usually about 10 to 20 times per day (out of thousands of requests).

We actually set up some automated requests to test whether or not there were patterns, and to help us identify times of the long-running requests. It appears there is no pattern. Here is a sample of data I collected:

Wed Jun  3 21:25:11 UTC 2020	HTTP Code: 200, Duration: 944.262, Effective URL: [removed sensitive data]
Wed Jun  3 22:47:50 UTC 2020	HTTP Code: 200, Duration: 943.671, Effective URL: [removed sensitive data]
Thu Jun  4 02:33:00 UTC 2020	HTTP Code: 200, Duration: 944.861, Effective URL: [removed sensitive data]
Thu Jun  4 03:43:23 UTC 2020	HTTP Code: 200, Duration: 943.244, Effective URL: [removed sensitive data]
Thu Jun  4 13:53:22 UTC 2020	HTTP Code: 200, Duration: 945.197, Effective URL: [removed sensitive data]

@pluckhurst
Copy link
Author

We switched to version 1.2.18 and the problem appears to be fixed, When this happens now we see the connection get aborted after 30 seconds, it then reconnects and resumes processing. This all happens inside the azure client code and we made no changes to our code that calls it.

@kl298p
Copy link

kl298p commented Jun 5, 2020

Thanks for the info! I will pass this on. We discovered the same issue that exists with private link does not seem to exist if instead we use Vnet peering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants