Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) #1405

Closed
wants to merge 7 commits into from

Conversation

eguzki
Copy link
Member

@eguzki eguzki commented Jun 26, 2023

What

As part of https://issues.redhat.com/browse/THREESCALE-9542 this is fixing the support for chunked requests when APIcast talks directly to upstream (not proxy between APIcast and upstream).

The fix is disabling request buffering, which means nginx will forward request body ASAP. Does this mean that when the downstream request in chunked (Transfer-Encoding: chunked), the request to upstream service will also be chunked? Well, it depends. If nginx can read the entire body in one chunk, the upstream request will not be chunked and Content-Length header will be added. On the other hand, if after reading the first chunk, nginx cannot know the body length, it will build an upstream request with transfer encoding chunked. This usually happens when the second chunk is delayed.

The changes to support chunked requests affect only gateway/src/apicast/configuration/service.lua and gateway/conf.d/apicast.conf. The additional files changed in this PR are only to create dev environments to reproduce use cases.

Dev Note: In order to implement chunked requests, request body should not be read by the gateway. If the body is read, buffering happens and the entire body will be read before sending request to upstream. When the request body is read, Nginx will know the request body length, hence will favor upstream request with the Content-Length header instead of the chunked one. However, APIcast reads the body looking for request params (for Content-Type: application/x-www-form-urlencoded requests) because 3scale mapping rules may match parameters in the body. Thus, for requests with Content-Type: application/x-www-form-urlencoded, the request body will be read (and buffered) before sending request to upstream and the request to upstream will always be with Content-Length header.

Verification steps

  • Git checkout this branch
❯ git checkout support-plain-upstream-chunked-requests
  • Builld local image
❯ make runtime-image IMAGE_NAME=apicast-test
  • Run environment with APIcast configured as gateway of an echo API (from httpbin.org) in plain HTTP 1.1
❯ make plain-upstream-gateway IMAGE_NAME=apicast-test
  • Send chunked request with one chunk body
❯ cat <<EOF >my-data.json
{
   "a": 1
}
EOF
# you need container name of the gateway (`docker ps` can tell)
❯ APICAST_IP=$(docker inspect apicast_build_0-gateway-run-3b16b962fa2a | yq e -P '.[0].NetworkSettings.Networks.apicast_build_0_default.IPAddress' -)

❯ curl -v -H "Transfer-Encoding: chunked" -H "Host: post" -H "Content-Type: application/json" -d @my-data.json http://${APICAST_IP}:8080/?user_key=foo
*   Trying 172.20.0.4:8080...
* Connected to 172.20.0.4 (172.20.0.4) port 8080 (#0)
> POST /?user_key=foo HTTP/1.1
> Host: post
> User-Agent: curl/7.81.0
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/json
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: openresty
< Date: Mon, 26 Jun 2023 15:04:36 GMT
< Content-Type: application/json
< Content-Length: 394
< Connection: keep-alive
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
< 
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "{   \"a\": 1}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "11", 
    "Content-Type": "application/json", 
    "Host": "upstream-rely:8080", 
    "User-Agent": "curl/7.81.0"
  }, 
  "json": {
    "a": 1
  }, 
  "origin": "172.20.0.3", 
  "url": "http://upstream-rely:8080/post?user_key=foo"
}
* Connection #0 to host 172.20.0.4 left intact

That request should return 200 OK.

Note that upstream echo API is reporting that the request included Content-Length header and the expected body.

  • Send chunked request with few chunks in the body delayed in time. Python3 is required.

First add to /etc/hosts file post hostname having ${APICAST_IP} value. Resulting in:

❯ cat /etc/hosts | grep post
172.20.0.4      post

Then send request

cat <<EOF >chunked-request.py
import time
import requests

def gen():
    yield bytes('hi', "utf-8")
    time.sleep(2)
    yield bytes('there', "utf-8")
    time.sleep(2)
    yield bytes('bye', "utf-8")

# /etc/hosts having ${APICAST_IP} "post" (setting headers duplicates Host header)
resp = requests.post('http://post:8080/?user_key=foo', data=gen())
print(resp.text)
EOF
❯ python3 chunked-request.py
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "hitherebye", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "upstream-rely:8080", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "python-requests/2.25.1"
  }, 
  "json": null, 
  "origin": "172.20.0.3", 
  "url": "http://upstream-rely:8080/post?user_key=foo"
}

Note that the upstream service got transfer encoding chunked request.

  • Check http rely service installed in the environment to see the raw request to the upstream service and also the raw response.
❯ docker compose -p apicast_build_0 logs upstream-rely
upstream-rely  | POST /post?user_key=foo HTTP/1.1\r
upstream-rely  | X-Real-IP: 172.20.0.1\r
upstream-rely  | Host: upstream-rely:8080\r
upstream-rely  | Transfer-Encoding: chunked\r
upstream-rely  | User-Agent: python-requests/2.25.1\r
upstream-rely  | Accept-Encoding: gzip, deflate\r
upstream-rely  | Accept: */*\r
upstream-rely  | \r
upstream-rely  | 2\r
upstream-rely  | hi\r
upstream-rely  | > 2023/06/26 15:15:48.207817  length=10 from=201 to=210
upstream-rely  | 5\r
upstream-rely  | there\r
upstream-rely  | > 2023/06/26 15:15:50.209816  length=13 from=211 to=223
upstream-rely  | 3\r
upstream-rely  | bye\r
upstream-rely  | 0\r
upstream-rely  | \r
upstream-rely  | < 2023/06/26 15:15:50.210287  length=628 from=0 to=627
upstream-rely  | HTTP/1.1 200 OK\r
upstream-rely  | Server: gunicorn/19.9.0\r
upstream-rely  | Date: Mon, 26 Jun 2023 15:15:50 GMT\r
upstream-rely  | Connection: keep-alive\r
upstream-rely  | Content-Type: application/json\r
upstream-rely  | Content-Length: 398\r
upstream-rely  | Access-Control-Allow-Origin: *\r
upstream-rely  | Access-Control-Allow-Credentials: true\r
upstream-rely  | \r
upstream-rely  | {
upstream-rely  |   "args": {
upstream-rely  |     "user_key": "foo"
upstream-rely  |   }, 
upstream-rely  |   "data": "hitherebye", 
upstream-rely  |   "files": {}, 
upstream-rely  |   "form": {}, 
upstream-rely  |   "headers": {
upstream-rely  |     "Accept": "*/*", 
upstream-rely  |     "Accept-Encoding": "gzip, deflate", 
upstream-rely  |     "Host": "upstream-rely:8080", 
upstream-rely  |     "Transfer-Encoding": "chunked", 
upstream-rely  |     "User-Agent": "python-requests/2.25.1"
upstream-rely  |   }, 
upstream-rely  |   "json": null, 
upstream-rely  |   "origin": "172.20.0.3", 
upstream-rely  |   "url": "http://upstream-rely:8080/post?user_key=foo"
upstream-rely  | }

Note the chunked encoding of the request with the length bytes preceding each chunk.

eguzki added 4 commits June 22, 2023 13:58
It is required to not read request body for chunked requests to be propagated as chunked requests.
Only read the request body when POST query arguments (of the MIME type application/x-www-form-urlencoded)
and thus query arguments could be in the request body.

Note: chunked requests are propagated to upstream with Transfer-Encoding chunked only when client request sends more than one chunk
@eguzki eguzki force-pushed the support-plain-upstream-chunked-requests branch from 45da7a9 to 656e26d Compare June 26, 2023 18:15
@eguzki eguzki marked this pull request as ready for review June 26, 2023 18:24
@eguzki eguzki requested a review from a team as a code owner June 26, 2023 18:24
@eguzki eguzki requested a review from kevprice83 June 26, 2023 18:25
@eguzki eguzki force-pushed the support-plain-upstream-chunked-requests branch from 656e26d to 87f67fc Compare June 26, 2023 21:17
@eguzki eguzki changed the title Support chunked requests when upstream is plain HTTP 1.1 Support chunked requests when APIcast talks directly to upstream (not proxy in the middle) Jun 29, 2023
@@ -95,6 +95,8 @@ location @upstream {
proxy_pass $proxy_pass;

proxy_http_version 1.1;
proxy_request_buffering off;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if turning this off globally is a good idea (Slowloris DDoS attack) May be we can have another location block @upstream-unbufferd instead?

There is an interesting discussion on GitLab https://gitlab.com/gitlab-org/gitlab/-/issues/325095

Copy link
Member Author

@eguzki eguzki Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to discuss about potential issues with this approach.

Slowloris DDoS attack is about opening too many TCP connections with unfinished HTTP requests. You are suggesting that the APIcast gateway should protect upstream from opening connections against upstream until the entire request is read (and buffered). This what I understand, let me know if I am wrong. If that is true, then https://issues.redhat.com/browse/THREESCALE-9542 (requests with transfer encoding chunked) cannot be supported for all scenarios. For the current scenario (no proxy), when the request is buffered by nginx, I do not know how to tell nginx to stick with the chunked requests on the upstream HTTP session. Nginx has always added the Content-Length to the upstream request when it knows the request length. If we knew how to change Nginx behavior, request would be buffered (preventing from the DDoS attack) and chunked requests would be propagated to upstream with the same encoding.

I assumed we wanted to respect chunked requests with big size bodies (with Gb length) to avoid buffering (and writing to temp file)

Copy link
Member Author

@eguzki eguzki Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have another location @upstream-unbufferd, what would be the routing logic to forward to @upstream or @upstream-unbuffered ? For requests with "small" body, no matter chunked or not, nginx will read and know the length of the body. Then it will send the request to upstream with the "content-length" header. Thus, it does not matter @upstream or @upstream-unbuffered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I confused you, but what I suggest is to allow disable proxy_request_buffering per location instead of globally.
The reason are:

  1. DDoS attack
  2. This is a breaking change, so I think it's better to be able to configure this change per location than to go and upgrade and everything suddenly stops working.

Also, I'm just throwing ideas around, so please ignore me if I'm talking non-sense here 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This location is dynamic so by doing this via a policy you are not enabling it globally. I don't know what was the outcome of the last meeting we had but did we figure out if it was possible to just implement transfer-encoding as a policy? Leave existing behaviour for buffered requests as it is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from the last meeting, we concluded that Transfer-Encoding header is hop-by-hop header, so the gateway is not enforced to propagate that header or transfer encoding itself. What it can be configured (exactly the same as nginx does), is the request buffering. It can be on/off.

@eguzki thinks the issue with the policy can be the order where the "request buffering" policy exists. If it is in an incorrect order, it might not work as expected. And he also thinks that we should have a single location @upstream and control the proxy_request_buffering via the Lua code. However, something like proxy_request_buffering $proxy_request_buffering won't work as nginx complained on boot saying that proxy_request_buffering should be on or off.

My proposal is to have a second @upstream-unbuffered block inside https://github.com/3scale/APIcast/blob/master/gateway/conf.d/apicast.conf
and inside the policy we set the request context, something like request_buffered = false
and here https://github.com/3scale/APIcast/blob/master/gateway/src/apicast/upstream.lua#L250 we can check the context and rewrite the location_name to @upstream-unbuffered

@eguzki eguzki force-pushed the support-plain-upstream-chunked-requests branch from 909f6a0 to 73cd1a9 Compare July 3, 2023 15:01
@eguzki
Copy link
Member Author

eguzki commented Nov 7, 2023

superseded by #1408

@eguzki eguzki closed this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants