feat: emit metrics from CRT HTTP engine #1017

ianbotsf · 2023-12-21T15:41:09Z

Issue #

#893

Description of changes

This change emits metrics from the CRT HTTP engine similar to the ones emitted by the OkHttp engine.

Companion PRs: awslabs/aws-crt-java#738, awslabs/aws-crt-kotlin#86, awslabs/aws-crt-kotlin#88

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lauzadis · 2024-01-02T19:44:50Z

...ttp-client-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/ConnectionManager.kt

        }
    }
+
+    private fun emitConnections() {


nit / naming: emitConnectionsMetrics?

Since the class already has the word "connection" in it, renaming to simply emitMetrics.

lauzadis · 2024-01-02T19:50:19Z

...ent-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/SdkStreamResponseHandler.kt

+
+        if (sendEnd != null && receiveStart != null) {
+            val ttfb = receiveStart - sendEnd
+            if (ttfb.isPositive()) {


question: have you seen any instances where this is negative?

Not directly but event streams may begin receiving traffic before sending has concluded since the communication is bidirectional and duplexed. It seemed prudent in that situation to not report TTFB.

lauzadis · 2024-01-02T19:51:21Z

...ngine-crt/jvm/test/aws/smithy/kotlin/runtime/http/engine/crt/SdkStreamResponseHandlerTest.kt

@@ -3,17 +3,23 @@
 * SPDX-License-Identifier: Apache-2.0
 */

+@file:OptIn(ExperimentalApi::class)


question: Can this just be an opt-in on the test class?

It can actually be removed. I forgot why I added this but it's unused now!

lauzadis · 2024-01-02T19:53:41Z

...-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/io/ReportingByteReadChannel.kt

comment: These new body types are missing tests

aajtodd · 2024-01-05T19:33:51Z

runtime/runtime-core/common/src/aws/smithy/kotlin/runtime/time/Instant.kt

+     * the resulting duration will be negative.
+     * @param other The [Instant] marking the end of the duration
+     */
+    public operator fun minus(other: Instant): Duration


aajtodd · 2024-01-05T19:34:02Z

runtime/runtime-core/common/src/aws/smithy/kotlin/runtime/time/Instant.kt

@@ -102,4 +109,7 @@ public fun Instant.Companion.fromEpochMilliseconds(milliseconds: Long): Instant
    return fromEpochSeconds(secs, ns.toInt())
 }

+public fun Instant.Companion.fromEpochNanoseconds(ns: Long): Instant =


aajtodd · 2024-01-05T19:42:06Z

...ttp-client-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/ConnectionManager.kt

@@ -61,16 +66,22 @@ internal class ConnectionManager(
        val manager = getManagerForUri(request.uri, proxyConfig)
        var leaseAcquired = false

+        metrics.queuedRequests = pending.incrementAndGet()


correctness: If we are trying to acquire a connection then it's not queued right? I think queued would be before we hit the requestLimiter.

metrics.queuedRequests = pending.incrementAndGet() requestLimiter.withPermit { metrics.queuedRequests = pending.decrementAndGet() }

OK, if "queued" means "before we attempt to acquire a connection" then I'm guessing that the requestsQueuedDuration measurement below is also wrong. I'll move it too.

Had to look at the definitions I gave them again and I think this would be inline with what it says

queued=waiting to be executed (e.g. waiting for thread to be available), in-flight=actively processing

If we are to the point of acquiring a connection we are "actively processing" the request. I can see where the definition could be interpreted differently though as in "I have all the resources needed at this point to execute the request" but I think establishing a connection (or waiting on one to be available) is part of overall request processing. Curious if others disagree.

This still doesn't look correct, I don't think queuedRequests is calculated in connection manager.

Ah I think I was confusing the semaphore inside ConnectionManager with the one inside CrtHttpEngine. Switching.

aajtodd · 2024-01-05T19:43:52Z

...ttp-client-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/ConnectionManager.kt

        }
    }
+
+    private fun emitConnections() {
+        val idleConnections = leases.availablePermits.toLong()


correctness: this semaphore isn't tied in anyway to actual connections in CRT, you could have 100 max connections configured but that doesn't mean you have 100 idle connections already connected waiting to be used. I'm thinking this would have to come from the actual CRT connection manager.

Oh jeez, good point. Yes, this is clearly flawed.

That brings up an interesting wrinkle, though. We don't have just one CRT connection manager, we have one for each host. Does the Smithy metric for acquired connections equal the sum of acquired connections across all underlying CRT connection managers?

Moreover, we have a CRT config param called maxConnections. Contrary to its name, it's used as the maximum number of connections for each underlying CRT connection manager. Meaning if maxConnections is 100 and we connect to 3 hosts, there are actually 3 underlying CRT connection managers which each have a max of 100 connections for a total of 300. How then do we measure the Smithy metric of idle connections? Is it the sum of idle connections across all underlying CRT connection managers (which may be more than maxConnections)? Or is it maxConnections minus the sum of acquired connections across all underlying CRT connection managers (which may be less than the actual number of open/idle connections)?

The fact CRT uses connection manager per host is an implementation detail, the config setting is at the engine level. This is what the leases semaphore is for is to control max connections across multiple managers (albeit it likely isn't perfect at ensuring we never cross the maxConnections threshold due to idle connections).

The reason I allowed each connection manager to be configured with maxConnections is because we don't know how many hosts an engine will be used to connect to, it may be 1 or it may be split across many hosts which will require many connection managers.

I see this as an area that may cause confusion later, especially if users begin correlating the idle connections metric with connections reported by the OS or JVM. I don't have a better suggestion though given how the CRT connection manager works. I'll switch the implementation to calculate the idle connections metric as maxConnections minus the sum of acquired connections across all underlying CRT connection managers.

Oh and interrogating the number of active connections from the CRT connection managers requires another revision of aws-crt-kotlin: awslabs/aws-crt-kotlin#88

aajtodd · 2024-01-09T19:10:48Z

...ttp-client-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/ConnectionManager.kt

+    private fun emitMetrics() {
+        val acquiredConnections = connManagers.values.sumOf { it.managerMetrics.leasedConcurrency }
+        metrics.acquiredConnections = acquiredConnections
+        metrics.idleConnections = config.maxConnections.toLong() - acquiredConnections


fix: This still isn't right, idle connections are connections established with the server not currently in use. If I'm not mistaken this is just availableConcurrency which would require the same summing across managers.

OK, sum of availableConcurrency sounds doable, although it will exceed max connections under some circumstances.

aajtodd · 2024-01-09T19:13:50Z

...ttp-client-engine-crt/jvm/src/aws/smithy/kotlin/runtime/http/engine/crt/ConnectionManager.kt

@@ -61,16 +66,22 @@ internal class ConnectionManager(
        val manager = getManagerForUri(request.uri, proxyConfig)
        var leaseAcquired = false

+        metrics.queuedRequests = pending.incrementAndGet()


This still doesn't look correct, I don't think queuedRequests is calculated in connection manager.

sonarqubecloud · 2024-01-10T23:02:23Z

Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

1 New issue
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

feat: emit metrics from CRT HTTP engine

bdb0814

ianbotsf requested a review from a team as a code owner December 21, 2023 15:41

ianbotsf mentioned this pull request Dec 21, 2023

feat: enable access to metrics for HTTP streams awslabs/aws-crt-kotlin#86

Merged

lint

2174a83

lauzadis approved these changes Jan 2, 2024

View reviewed changes

aajtodd suggested changes Jan 5, 2024

View reviewed changes

ianbotsf added 2 commits January 5, 2024 22:02

bump to latest aws-crt-kotlin release

0357d85

Merge remote-tracking branch 'origin/main' into feat-crt-http-metrics

1b970a4

ianbotsf mentioned this pull request Jan 8, 2024

feat: surface HTTP connection manager metrics awslabs/aws-crt-kotlin#88

Merged

addressing PR feedback

24278a6

lauzadis approved these changes Jan 8, 2024

View reviewed changes

aajtodd suggested changes Jan 9, 2024

View reviewed changes

ianbotsf added 2 commits January 10, 2024 22:57

addressing PR feedback

c36f9fe

lint

d27f858

ianbotsf requested a review from aajtodd February 5, 2024 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: emit metrics from CRT HTTP engine #1017

feat: emit metrics from CRT HTTP engine #1017

ianbotsf commented Dec 21, 2023 •

edited

Loading

lauzadis Jan 2, 2024

ianbotsf Jan 5, 2024

lauzadis Jan 2, 2024

ianbotsf Jan 5, 2024

lauzadis Jan 2, 2024

ianbotsf Jan 5, 2024

lauzadis Jan 2, 2024

aajtodd Jan 5, 2024

aajtodd Jan 5, 2024

aajtodd Jan 5, 2024

ianbotsf Jan 5, 2024

aajtodd Jan 8, 2024

aajtodd Jan 9, 2024

ianbotsf Jan 10, 2024

aajtodd Jan 5, 2024

ianbotsf Jan 5, 2024

aajtodd Jan 8, 2024

ianbotsf Jan 8, 2024

ianbotsf Jan 8, 2024

aajtodd Jan 9, 2024

ianbotsf Jan 10, 2024

aajtodd Jan 9, 2024

sonarqubecloud bot commented Jan 10, 2024

feat: emit metrics from CRT HTTP engine #1017

Are you sure you want to change the base?

feat: emit metrics from CRT HTTP engine #1017

Conversation

ianbotsf commented Dec 21, 2023 • edited Loading

Issue #

Description of changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Jan 10, 2024

Quality Gate passed

ianbotsf commented Dec 21, 2023 •

edited

Loading