-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support http1 full duplex per workload #14568
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #14568 +/- ##
==========================================
- Coverage 86.02% 85.92% -0.10%
==========================================
Files 197 197
Lines 14950 14991 +41
==========================================
+ Hits 12860 12881 +21
- Misses 1777 1796 +19
- Partials 313 314 +1 ☔ View full report in Codecov by Sentry. |
/test performance-tests |
1 similar comment
/test performance-tests |
/test istio-latest-no-mesh-tls |
Sec failure:
It seems that we need to setup go version with a separate step for CodeQL. |
cc @dprotaso gentle ping |
cmd/activator/main.go
Outdated
@@ -236,7 +236,7 @@ func main() { | |||
|
|||
// NOTE: MetricHandler is being used as the outermost handler of the meaty bits. We're not interested in measuring | |||
// the healthchecks or probes. | |||
ah = activatorhandler.NewMetricHandler(env.PodName, ah) | |||
ah = wrapActivatorHandlerWithFullDuplex(activatorhandler.NewMetricHandler(env.PodName, ah)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for me to understand better, what is the reason that we need the handler twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Is it not possible to do the annotation lookup and setting the duplex option when we proxy the request here ?
serving/pkg/activator/handler/handler.go
Line 126 in 7561386
var proxy *httputil.ReverseProxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secondly, I say we don't include this change on the metrics handler - I don't think users have reported an issue with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics handler is on the request path and activator logs the same error we are trying to address. If I don't wrap it I get that error, not as frequently, but I do after a large number of requests.
Each handler interacts in isolation wrt other handlers in the chain, when it comes to the full duplex thingy. That is why I wrap the handlers that do create a problem in the chain individually. Bottom line is that handlers that override write logic eg:
type Writer interface {
Write(p []byte) (n int, err error)
}
create the issue seen. Probably we need an e2e test to copy the test logic or you can test this manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each handler interacts in isolation wrt other handlers in the chain, when it comes to the full duplex thingy.
The underlying ResponseWriter is the one from net/http though - so EnableFullDuplex needs to only be called once.
https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/net/http/server.go;l=504-507
When we wrap this original writer in various handlers we should include an Unwrap
method
https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/net/http/responsecontroller.go;l=42-44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check. We certainly don't pass the original writer everywhere:
// The ResponseWriter should be the original value passed to the Handler.ServeHTTP method,
// or have an Unwrap method returning the original ResponseWriter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems you need to only set EnableFullDuplex
at the outermost handler (first called). Adding the unwrap method didn't not change the outcome locally but it is good practice so that the handlers can be used with a responsecontroller. The initial responsewriter passed by net/http (when it schedules the user handle) is the only one that implements the EnableFullDuplex
thing anyway. Anything else should return not supported (our handlers passed as responseWriters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @skonto!
In general I feel like we have to wrap in a lot of places, but probably have to do it that way.
Also, can we run some of the existing e2e tests with that flag enabled?
Can you also update the go.mod file to ensure folks are building with Otherwise this feature won't work. |
@dprotaso ready, gentle ping. |
@dprotaso gentle ping. |
1 similar comment
@dprotaso gentle ping. |
Looks good - I tried to run the test locally and it seemed to hang
I'm on Mac ARM - go version 1.21.6 |
@dprotaso hi,
I don't use Mac but maybe @ReToCode can help. This seems like a known MAC network setup error, check also here for a fix. At my side tests pass (go 1.21.6):
I rebased and still see no such failure. |
I tested it locally and I have the same. It seems like MacOS has more restrictive default settings for opening a lot of response-ports (see https://copyprogramming.com/howto/ab-program-freezes-after-lots-of-requests-why). I can see a lot of This fixes it:
But I don't think it is a good idea to force people to do this. So maybe we can just skip the test when |
|
/test istio-latest-no-mesh |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, skonto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* support http1 full duplex per workload * lint * style * single call * updates & unit test * fix unit test * fix comment * address review comments * fix lint * skip for Mac
* support http1 full duplex per workload * lint * style * single call * updates & unit test * fix unit test * fix comment * address review comments * fix lint * skip for Mac
…rkload (knative#14568) (#591) * Support http1 full duplex per workload (knative#14568) * support http1 full duplex per workload * lint * style * single call * updates & unit test * fix unit test * fix comment * address review comments * fix lint * skip for Mac * Create less load for TestActivatorChainHandlerWithFullDuplex (knative#14820) * less load * limit cons
…rkload (knative#14568) (knative#591) * Support http1 full duplex per workload (knative#14568) * support http1 full duplex per workload * lint * style * single call * updates & unit test * fix unit test * fix comment * address review comments * fix lint * skip for Mac * Create less load for TestActivatorChainHandlerWithFullDuplex (knative#14820) * less load * limit cons
…rkload (knative#14568) (#591) (#605) * Support http1 full duplex per workload (knative#14568) * support http1 full duplex per workload * lint * style * single call * updates & unit test * fix unit test * fix comment * address review comments * fix lint * skip for Mac * Create less load for TestActivatorChainHandlerWithFullDuplex (knative#14820) * less load * limit cons
Fixes #12387
Proposed Changes
features.knative.dev/http-full-duplex=Enabled
will get the support for http1 full duplex end-to-end on the data path. Activator and QP h2c handlers dont seem to create an issue on the http1 case.Note: If any of the wrappers is removed then you get ~1 request failure over tens of thousands of requests.
Had to run the test multiple times to get one failure, each run creates ~30K requests over a period ~1m.
Release Note