Skip to content

Commit

Permalink
fix(app-shell): reduce latency when switching between MQTT clients (#…
Browse files Browse the repository at this point in the history
…15228)

Closes RQA-2748

Each network interface is tied to a specific IP, and the app-shell uses one IP to communicate with MQTT. If a client manually disconnects from a network interface that is connected to MQTT, some network interfaces on some robots (such as the USB on the OT-2) do not fire the MQTT 'error' or 'disconnect' events immediately. This means that the app-shell thinks it's still connected to MQTT until the keepAlive packet receives no response.

The MQTT standard keepalive (and what we currently use) is 60 seconds. After the packet is sent, there is a connectTimeout that the server has to respond before the client fires the 'disconnect' event (we use 30 seconds, the standard). After a client disconnects, the app-shell will attempt to connect to the same robot on a different IP if available.

This problem scenario shouldn't affect most users in most circumstances, but there is one rather simple improvement we can make that sometimes reduces this latency - MQTT has an 'offline' event. Some network interfaces cause the 'offline' event to fire, and we can immediately end the client (and reestablish a new MQTT connection on a different network interface if available).
  • Loading branch information
mjhuff authored and ryanthecoder committed May 28, 2024
1 parent 278478c commit df2a07e
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion app-shell/src/notifications/connect.ts
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ export function establishConnections(
): Promise<RobotData[]> {
return new Promise((resolve, reject) => {
const newConnections = healthyRobots.filter(({ ip, robotName }) => {
// Only attempt a new connection if the current broker connection is bad.
if (connectionStore.isConnectedToBroker(robotName)) {
return false
} else {
Expand Down Expand Up @@ -171,7 +172,7 @@ function establishListeners(
client.on('reconnect', () => {
notifyLog.debug(`Attempting to reconnect to ${robotName} on ${ip}`)
})
// handles transport layer errors only
// Handles transport layer errors only
client.on('error', error => {
notifyLog.warn(`Error - ${error.name}: ${error.message}`)
sendDeserializedGenericError(ip, 'ALL_TOPICS')
Expand All @@ -193,6 +194,15 @@ function establishListeners(
)
sendDeserializedGenericError(ip, 'ALL_TOPICS')
})

// Some network interfaces (such as link-local) will *sometimes* fire the 'offline' event when physically disconnected.
// While the keepalive packet will eventually result in an 'error' event and close the client, we can proactively
// close the client when possible, allowing for the connection store to connect to the robot on an alternative IP
// if one is available.
client.on('offline', () => {
sendDeserializedGenericError(ip, 'ALL_TOPICS')
client.end()
})
}

export function closeConnectionsForcefullyFor(
Expand Down

0 comments on commit df2a07e

Please sign in to comment.