Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) #1405

eguzki · 2023-06-26T14:00:49Z

What

As part of https://issues.redhat.com/browse/THREESCALE-9542 this is fixing the support for chunked requests when APIcast talks directly to upstream (not proxy between APIcast and upstream).

The fix is disabling request buffering, which means nginx will forward request body ASAP. Does this mean that when the downstream request in chunked (Transfer-Encoding: chunked), the request to upstream service will also be chunked? Well, it depends. If nginx can read the entire body in one chunk, the upstream request will not be chunked and Content-Length header will be added. On the other hand, if after reading the first chunk, nginx cannot know the body length, it will build an upstream request with transfer encoding chunked. This usually happens when the second chunk is delayed.

The changes to support chunked requests affect only gateway/src/apicast/configuration/service.lua and gateway/conf.d/apicast.conf. The additional files changed in this PR are only to create dev environments to reproduce use cases.

Dev Note: In order to implement chunked requests, request body should not be read by the gateway. If the body is read, buffering happens and the entire body will be read before sending request to upstream. When the request body is read, Nginx will know the request body length, hence will favor upstream request with the Content-Length header instead of the chunked one. However, APIcast reads the body looking for request params (for Content-Type: application/x-www-form-urlencoded requests) because 3scale mapping rules may match parameters in the body. Thus, for requests with Content-Type: application/x-www-form-urlencoded, the request body will be read (and buffered) before sending request to upstream and the request to upstream will always be with Content-Length header.

Verification steps

Git checkout this branch

❯ git checkout support-plain-upstream-chunked-requests

Builld local image

❯ make runtime-image IMAGE_NAME=apicast-test

Run environment with APIcast configured as gateway of an echo API (from httpbin.org) in plain HTTP 1.1

❯ make plain-upstream-gateway IMAGE_NAME=apicast-test

Send chunked request with one chunk body

❯ cat <<EOF >my-data.json
{
   "a": 1
}
EOF

# you need container name of the gateway (`docker ps` can tell)
❯ APICAST_IP=$(docker inspect apicast_build_0-gateway-run-3b16b962fa2a | yq e -P '.[0].NetworkSettings.Networks.apicast_build_0_default.IPAddress' -)

❯ curl -v -H "Transfer-Encoding: chunked" -H "Host: post" -H "Content-Type: application/json" -d @my-data.json http://${APICAST_IP}:8080/?user_key=foo
*   Trying 172.20.0.4:8080...
* Connected to 172.20.0.4 (172.20.0.4) port 8080 (#0)
> POST /?user_key=foo HTTP/1.1
> Host: post
> User-Agent: curl/7.81.0
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/json
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: openresty
< Date: Mon, 26 Jun 2023 15:04:36 GMT
< Content-Type: application/json
< Content-Length: 394
< Connection: keep-alive
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
< 
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "{   \"a\": 1}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "11", 
    "Content-Type": "application/json", 
    "Host": "upstream-rely:8080", 
    "User-Agent": "curl/7.81.0"
  }, 
  "json": {
    "a": 1
  }, 
  "origin": "172.20.0.3", 
  "url": "http://upstream-rely:8080/post?user_key=foo"
}
* Connection #0 to host 172.20.0.4 left intact

That request should return 200 OK.

Note that upstream echo API is reporting that the request included Content-Length header and the expected body.

Send chunked request with few chunks in the body delayed in time. Python3 is required.

First add to /etc/hosts file post hostname having ${APICAST_IP} value. Resulting in:

❯ cat /etc/hosts | grep post
172.20.0.4      post

Then send request

❯ cat <<EOF >chunked-request.py
import time
import requests

def gen():
    yield bytes('hi', "utf-8")
    time.sleep(2)
    yield bytes('there', "utf-8")
    time.sleep(2)
    yield bytes('bye', "utf-8")

# /etc/hosts having ${APICAST_IP} "post" (setting headers duplicates Host header)
resp = requests.post('http://post:8080/?user_key=foo', data=gen())
print(resp.text)
EOF

❯ python3 chunked-request.py
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "hitherebye", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "upstream-rely:8080", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "python-requests/2.25.1"
  }, 
  "json": null, 
  "origin": "172.20.0.3", 
  "url": "http://upstream-rely:8080/post?user_key=foo"
}

Note that the upstream service got transfer encoding chunked request.

Check http rely service installed in the environment to see the raw request to the upstream service and also the raw response.

❯ docker compose -p apicast_build_0 logs upstream-rely
upstream-rely  | POST /post?user_key=foo HTTP/1.1\r
upstream-rely  | X-Real-IP: 172.20.0.1\r
upstream-rely  | Host: upstream-rely:8080\r
upstream-rely  | Transfer-Encoding: chunked\r
upstream-rely  | User-Agent: python-requests/2.25.1\r
upstream-rely  | Accept-Encoding: gzip, deflate\r
upstream-rely  | Accept: */*\r
upstream-rely  | \r
upstream-rely  | 2\r
upstream-rely  | hi\r
upstream-rely  | > 2023/06/26 15:15:48.207817  length=10 from=201 to=210
upstream-rely  | 5\r
upstream-rely  | there\r
upstream-rely  | > 2023/06/26 15:15:50.209816  length=13 from=211 to=223
upstream-rely  | 3\r
upstream-rely  | bye\r
upstream-rely  | 0\r
upstream-rely  | \r
upstream-rely  | < 2023/06/26 15:15:50.210287  length=628 from=0 to=627
upstream-rely  | HTTP/1.1 200 OK\r
upstream-rely  | Server: gunicorn/19.9.0\r
upstream-rely  | Date: Mon, 26 Jun 2023 15:15:50 GMT\r
upstream-rely  | Connection: keep-alive\r
upstream-rely  | Content-Type: application/json\r
upstream-rely  | Content-Length: 398\r
upstream-rely  | Access-Control-Allow-Origin: *\r
upstream-rely  | Access-Control-Allow-Credentials: true\r
upstream-rely  | \r
upstream-rely  | {
upstream-rely  |   "args": {
upstream-rely  |     "user_key": "foo"
upstream-rely  |   }, 
upstream-rely  |   "data": "hitherebye", 
upstream-rely  |   "files": {}, 
upstream-rely  |   "form": {}, 
upstream-rely  |   "headers": {
upstream-rely  |     "Accept": "*/*", 
upstream-rely  |     "Accept-Encoding": "gzip, deflate", 
upstream-rely  |     "Host": "upstream-rely:8080", 
upstream-rely  |     "Transfer-Encoding": "chunked", 
upstream-rely  |     "User-Agent": "python-requests/2.25.1"
upstream-rely  |   }, 
upstream-rely  |   "json": null, 
upstream-rely  |   "origin": "172.20.0.3", 
upstream-rely  |   "url": "http://upstream-rely:8080/post?user_key=foo"
upstream-rely  | }

Note the chunked encoding of the request with the length bytes preceding each chunk.

It is required to not read request body for chunked requests to be propagated as chunked requests. Only read the request body when POST query arguments (of the MIME type application/x-www-form-urlencoded) and thus query arguments could be in the request body. Note: chunked requests are propagated to upstream with Transfer-Encoding chunked only when client request sends more than one chunk

tkan145 · 2023-07-03T07:58:22Z

gateway/conf.d/apicast.conf

@@ -95,6 +95,8 @@ location @upstream {
  proxy_pass $proxy_pass;

  proxy_http_version 1.1;
+  proxy_request_buffering off;


I'm not sure if turning this off globally is a good idea (Slowloris DDoS attack) May be we can have another location block @upstream-unbufferd instead?

There is an interesting discussion on GitLab https://gitlab.com/gitlab-org/gitlab/-/issues/325095

I am happy to discuss about potential issues with this approach.

Slowloris DDoS attack is about opening too many TCP connections with unfinished HTTP requests. You are suggesting that the APIcast gateway should protect upstream from opening connections against upstream until the entire request is read (and buffered). This what I understand, let me know if I am wrong. If that is true, then https://issues.redhat.com/browse/THREESCALE-9542 (requests with transfer encoding chunked) cannot be supported for all scenarios. For the current scenario (no proxy), when the request is buffered by nginx, I do not know how to tell nginx to stick with the chunked requests on the upstream HTTP session. Nginx has always added the Content-Length to the upstream request when it knows the request length. If we knew how to change Nginx behavior, request would be buffered (preventing from the DDoS attack) and chunked requests would be propagated to upstream with the same encoding.

I assumed we wanted to respect chunked requests with big size bodies (with Gb length) to avoid buffering (and writing to temp file)

if we have another location @upstream-unbufferd, what would be the routing logic to forward to @upstream or @upstream-unbuffered ? For requests with "small" body, no matter chunked or not, nginx will read and know the length of the body. Then it will send the request to upstream with the "content-length" header. Thus, it does not matter @upstream or @upstream-unbuffered

Sorry if I confused you, but what I suggest is to allow disable proxy_request_buffering per location instead of globally.
The reason are:

DDoS attack

This is a breaking change, so I think it's better to be able to configure this change per location than to go and upgrade and everything suddenly stops working.

Also, I'm just throwing ideas around, so please ignore me if I'm talking non-sense here 😄

This location is dynamic so by doing this via a policy you are not enabling it globally. I don't know what was the outcome of the last meeting we had but did we figure out if it was possible to just implement transfer-encoding as a policy? Leave existing behaviour for buffered requests as it is?

So from the last meeting, we concluded that Transfer-Encoding header is hop-by-hop header, so the gateway is not enforced to propagate that header or transfer encoding itself. What it can be configured (exactly the same as nginx does), is the request buffering. It can be on/off.

@eguzki thinks the issue with the policy can be the order where the "request buffering" policy exists. If it is in an incorrect order, it might not work as expected. And he also thinks that we should have a single location @upstream and control the proxy_request_buffering via the Lua code. However, something like proxy_request_buffering $proxy_request_buffering won't work as nginx complained on boot saying that proxy_request_buffering should be on or off.

My proposal is to have a second @upstream-unbuffered block inside https://github.com/3scale/APIcast/blob/master/gateway/conf.d/apicast.conf
and inside the policy we set the request context, something like request_buffered = false
and here https://github.com/3scale/APIcast/blob/master/gateway/src/apicast/upstream.lua#L250 we can check the context and rewrite the location_name to @upstream-unbuffered

eguzki · 2023-11-07T09:08:55Z

superseded by #1408

eguzki added 4 commits June 22, 2023 13:58

docker compose env: plain HTTP 1.1 upstream

731344b

fix e2e tests

26b33c0

add unittests

3b24c8b

eguzki force-pushed the support-plain-upstream-chunked-requests branch from 45da7a9 to 656e26d Compare June 26, 2023 18:15

eguzki marked this pull request as ready for review June 26, 2023 18:24

eguzki requested a review from a team as a code owner June 26, 2023 18:24

eguzki requested a review from kevprice83 June 26, 2023 18:25

eguzki force-pushed the support-plain-upstream-chunked-requests branch from 656e26d to 87f67fc Compare June 26, 2023 21:17

eguzki changed the title ~~Support chunked requests when upstream is plain HTTP 1.1~~ Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) Jun 29, 2023

tkan145 reviewed Jul 3, 2023

View reviewed changes

eguzki added 2 commits July 3, 2023 17:01

update CHANGELOG.md

17e7ed1

support-tls-upstream-chunked-requests

73cd1a9

eguzki force-pushed the support-plain-upstream-chunked-requests branch from 909f6a0 to 73cd1a9 Compare July 3, 2023 15:01

enhanced docker compose env with proxy and TLS upstream

d681ed9

eguzki closed this Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) #1405

Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) #1405

eguzki commented Jun 26, 2023 •

edited

Loading

tkan145 Jul 3, 2023

eguzki Jul 3, 2023 •

edited

Loading

eguzki Jul 3, 2023 •

edited

Loading

tkan145 Jul 4, 2023

kevprice83 Jul 31, 2023

tkan145 Aug 7, 2023

eguzki commented Nov 7, 2023

Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) #1405

Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) #1405

Conversation

eguzki commented Jun 26, 2023 • edited Loading

What

Verification steps

tkan145 Jul 3, 2023

Choose a reason for hiding this comment

eguzki Jul 3, 2023 • edited Loading

Choose a reason for hiding this comment

eguzki Jul 3, 2023 • edited Loading

Choose a reason for hiding this comment

tkan145 Jul 4, 2023

Choose a reason for hiding this comment

kevprice83 Jul 31, 2023

Choose a reason for hiding this comment

tkan145 Aug 7, 2023

Choose a reason for hiding this comment

eguzki commented Nov 7, 2023

eguzki commented Jun 26, 2023 •

edited

Loading

eguzki Jul 3, 2023 •

edited

Loading

eguzki Jul 3, 2023 •

edited

Loading