-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(1346): fix control frames flood #2108
base: master
Are you sure you want to change the base?
Conversation
fw/http_frame.c
Outdated
@@ -280,6 +280,17 @@ __tfw_h2_send_frame(TfwH2Ctx *ctx, TfwFrameHdr *hdr, TfwStr *data, | |||
unsigned char buf[FRAME_HEADER_SIZE]; | |||
TfwStr *hdr_str = TFW_STR_CHUNK(data, 0); | |||
TfwH2Conn *conn = container_of(ctx, TfwH2Conn, h2); | |||
bool is_control_frame = !hdr->stream_id || hdr->type == HTTP2_RST_STREAM; | |||
|
|||
// If the peer is causing us to generate a lot of control frames, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use another type of comments like:
/*
*
*/
I think that it is better to implement new counter in frang and check max count of control frames as we do for all other limits in frang. |
Maybe global configuration is better than frang? Just like "http_max_header_list_size", it should apply to all HTTP/2 connections regardless of virtual host, and it is not a request response related parameter. |
1effa31
to
9153825
Compare
I don't see much sense to move it to frang. 1. There is no appropriate frang handler. 2. We don't need history. 3. We have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fw/http_frame.c
Outdated
*/ | ||
if (is_control_frame && | ||
atomic_read(&ctx->queued_control_frames) | ||
> max_queued_control_frames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, move brace to next line.
fw/http_frame.c
Outdated
if (is_control_frame && | ||
atomic_read(&ctx->queued_control_frames) | ||
> max_queued_control_frames) { | ||
T_ERR("Too many control frames in send queue, closing connection"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we use T_WARN
in such cases. frang as example.
fw/http_frame.c
Outdated
* run out of memory. | ||
*/ | ||
if (is_control_frame && | ||
atomic_read(&ctx->queued_control_frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need atomic here. Both functions where we read/inc/dec queued_control_frames are called under the socket lock.
However also there is one disadvantage of using it outside the frang, we can't configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't approve the PR since I don't understand what do we fix and why in this way.
Please, in this PR further development and other pull requests, write good commit messages what the commit does and why. In this case exact enumeration of CVEs and description of the approach to close the vulnerabilities is required. In #1346 for now we just have a bunch of CVE references, so it's not helpful in the review.
fw/http_frame.h
Outdated
* SETTINGS, PING and RST_STREAM that will be queued for writing before | ||
* the connection is closed to prevent memory exhaustion attacks. | ||
*/ | ||
#define MAX_QUEUED_CONTROL_FRAMES 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant isn't used anywhere, and only a string default value "10000" is used in http.c configuration.
fw/sock_clnt.c
Outdated
@@ -460,6 +460,9 @@ tfw_sk_write_xmit(struct sock *sk, struct sk_buff *skb, unsigned int mss_now, | |||
if (h2_mode) { | |||
h2 = tfw_h2_context(conn); | |||
tbl = &h2->hpack.enc_tbl; | |||
if (flags & SS_F_HTTT2_FRAME_CONTROL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (flags & SS_F_HTTT2_FRAME_CONTROL) { | |
if (unlikely(flags & SS_F_HTTT2_FRAME_CONTROL)) { |
fw/http_frame.h
Outdated
@@ -209,6 +223,7 @@ typedef struct { | |||
*/ | |||
typedef struct tfw_h2_ctx_t { | |||
spinlock_t lock; | |||
unsigned int queued_control_frames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably there is a better place for the 4 byte field in the data structure to avoid alignment holes
T_WARN("Too many control frames in send queue, closing connection\n"); | ||
r = SS_BLOCK_WITH_RST; | ||
goto err; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we limit egress control frames, not ingress from clients?
It is said in the comment above that control frames are PING, SETTINGS or RESET_STREAM, so what's the point to limit them on _egress_path? In general, (D)DoS attack implies some asymmetric conditions, when an attacker spends less resources than a victim server. It seems all the control frames sent from our side imply a some frame from a client. RESET even terminates a stream. E.g. we already have request_rate
, which blocks HTTP/2 Rapid Reset - doesn't it solve the problem. Do we have a test using the rate limit and exhibiting a (D)DoS attack? @RomanBelozerov FYI
P.S. even with this comment tempesta-tech/tempesta-test#612 (comment) it seems request_rate
will still do its job to block/mitigate the attack.
P.P.S. Is this a HTTP/2 slow read prevention? If so, then why don't we account the number and size of header and data frames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- According to the h2 spec, we should ack some control frames, e.g. PING, SETTINGS, RESET_STREAM, and there is no limit for the number of incoming frames in the spec, while it's necessary to limit them according to the responsive capability of the server, that is, when the server is under pressure or due to malicious client (zero TCP window), we should not ack the client in case of OOM or DDOS. If we limit the number of ingress paths instead, we cannot provision a reasonable threshold because the runtime changes always.
request_rate
is only used to limit the HTTP/1 or HTTP/2 requests, not control frames. So unfortunately it's useless for this purpose.- for generic slow read prevention, there are other two PRs for it: test OOM by header leak tempesta-test#616, test OOM by slow read attack tempesta-test#618. All slow-read-related issues are located in http2: tests for CVE-2019-9512/9517 tempesta-test#612.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case exact enumeration of CVEs and description of the approach to close the vulnerabilities is required. In #1346 for now we just have a bunch of CVE references, so it's not helpful in the review.
Yes. But 1346 refers to 612.
And Roman lists the exact CVEs we need to fix in 1346, and this PR, as part of PRs of 1346, it solves:
GHSA-hgr8-6h9x-f7q9 “Ping Flood”
GHSA-9259-5376-vjcj “Settings Flood”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krizhanovsky The inspiration of this fix comes from Nginx and golang's built-in http2 server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But 1346 refers to 612.
And Roman #1346 (comment) the exact CVEs we need to fix in 1346, and this PR, as part of PRs of 1346, it solves:
GHSA-hgr8-6h9x-f7q9 “Ping Flood”
GHSA-9259-5376-vjcj “Settings Flood”
I know. The idea is that we should have a good commit message history, so anyone should be able to quickly understand the changes history only from git
, without digging on github: this is faster and sometimes it's not trivial to find required task and pull request on github and reducing github (as 3rd-party product and company) dependency is also good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we solve the problem with
- rate limit for ingress h2 frames
- limit the number of pending bytes (aka TCP send buffer)
Both the limits are easy to understand by a user and they follow the drop early strategy. With the current #2108 we accept control frames, process them and only after that terminate the connection.If we configure that a client connection can have not more than N bytes (we actually can just get the value of the current sysctl send buffer minus data in flight - we know everything about TCP state) for transmission to a client, we already have N pending bytes to the client and receive a control frame, which must be replied, we know upfront that we can't send the reply and can just reset send TCP zero window.This is part of #498 and seems not so hard to do. Definitely more work than #2108, but I'd prefer to make a good iteration on the important task (scheduled for the next milestone) now rather than pull tasks from 1.0
I'm not sure, but looks like we will broke #1973 with send buffer. The main goal of 1973 is to have send buffer with max size = CONG_WND
If we have non-zero congestion window, then we don't have the problem with slow read. We should be on the maximum transfer size as large as the congestion window, but do not keep more than send buffer size of pending data (waiting to be pushed down to the stack by TCP). I.e. 'send buffer' limit the the size of data queued for transmission to the TCP client connectio. This is plus to the data, which is transmitted in #1973
With #1973 we push for TCP transmission up to congestion window size bytes (h2 encode, ecrypt and send to the IP layer). But we also have some (maybe a lot) data for transmission (in the TCP send queue).
With #2108 we aiming to catch slow http2 attack, but each control frame sent by us is actually triggered by some frame received from a client. We also aim to drop malicious data early, so we should drop the malicious client frames early (this is where Nginx and Go http do bad job). Another problem with 2108, also inherited from Nginx, is why 10,000? The slow read attack targets memory exhaust, not "frames number" exhaust, so we should account memory, not frames. Is 10K too much for the default 100 streams? Is it to small if a user defines 1000 streams?
My proposal is to send zero receive TCP window to a client if our send buffer is full for the client, i.e. do no accept new data from a client and notify them that we can't accept data if we can't reply to the data.
- receive a control frame (do not send TCP ACK!)
- determine if we have a send buffer space for a reply for the frame
- if yes - send ACK, if not - send zero tcp window and drop the tcp data
This is partial version of #498. We do not implement backpresure at least. Probably a lot of more logic isn't in this scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kingluo After discussion with @krizhanovsky we decided to implement rate limit for PING, SETTINGS, PRIORITY
frames. This limit must be simple as possible, just reset connection when receive frames too fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kingluo @const-t @EvgeniiMekhanik we also agreed that the task should be the part of #498, i.e. also limit the memory consumption for slow read attacks. I.e. we should have the rate limiting as @const-t mentioned above and memory consumption by each client, just like Nginx provides 2-layer protection
fw/sock_clnt.c
Outdated
@@ -460,6 +460,9 @@ tfw_sk_write_xmit(struct sock *sk, struct sk_buff *skb, unsigned int mss_now, | |||
if (h2_mode) { | |||
h2 = tfw_h2_context(conn); | |||
tbl = &h2->hpack.enc_tbl; | |||
if (flags & SS_F_HTTT2_FRAME_CONTROL) { | |||
--h2->queued_control_frames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that we can't counted control frames here. Several skbs can be aggregated in one tls record in this case we don't decrement count of control frames.
c3d466b
to
902025a
Compare
part of #1346