-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize Content-Type for multipart/form-data requests #1139
Conversation
TFW_STR_FROM("Content-Type"), | ||
TFW_STR_FROM(": "), | ||
TFW_STR_FROM("multipart/form-data; boundary="), | ||
req->multipart_boundary_raw, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req->multipart_boundary_raw
can't be compound string according to the usage. This should be asserted in unit tests, but i didn't found such check. But as I see usage of the variable in the http_parser, it actually is compound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, req->multipart_boundary_raw
can be a compound string, and I totally forgot about that. Fixed by explicitly handling both cases of compound and plain types of req->multipart_boundary_raw
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved back to the code version that expected req->multipart_boundary_raw
to be a single-chunk string.
Since this data is then used to recreate Content-Type
field of a request header, there is a huge possibility that buffer containing boundary string will overlap with the buffer it is copied into. Handling all the special cases looks infeasible at the moment, and the request parser was changed to make a copy of req->multipart_boundary_raw
. So it's always linear at this point.
fb71728
to
8e28bfc
Compare
There was a NULL-dereference due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same work in __req_parse_content_type()
is done twice, this is not acceptable for fast parser. Also tfw_http_recreate_content_type_multipart_hdr()
seems to be dangerous. I would like to ask you to provide a simple functional test just to test that the header is recreated successfully.
tempesta_fw/http.c
Outdated
if (req->method == TFW_HTTP_METH_POST | ||
&& (req->location->validate_post_req | ||
|| (req->vhost->loc_dflt | ||
&& req->vhost->loc_dflt->validate_post_req)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This location search is incomplete, need to search in the following order:
- current location of requests:
req->location
- default location of current vhost:
req->vhost->loc_dflt
- default vhost global configuration:
req->vhost->vhost_dflt
There is a task to simplify the search and place all effective configuration intoreq->location
(a part of Unified location processing #862) but it's not implemented yet.
Such search can look complicated but it because of inheriting of configuration directives from global objects to private sections. E.g. validate_post_req
option may be defined globally, so all vhosts will use the settings. Take a look at tfw_http_req_mark_nip()
function it's a good example how to search for the effective settings beneath locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved selection code to tfw_http_should_validate_post_req()
to make it readable. It also now checks all three possible places as proposed.
@@ -2155,6 +2176,14 @@ static TfwCfgSpec tfw_vhost_internal_specs[] = { | |||
.allow_repeat = true, | |||
.allow_reconfig = true, | |||
}, | |||
{ | |||
.name = "http_post_validate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option is allowed to be listed inside vhost
directive, inside location
directive, but not allowed to setup globally (not listed in tfw_vhost_specs[]
). Why? Why it's not allowed to be configured globally for all vhosts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about global configuration at first. Added the option to tfw_vhost_specs[]
too. Thanks.
tempesta_fw/http_parser.c
Outdated
__FSM_STATE(I_ContTypeParamOWS) { | ||
switch (c) { | ||
case ' ': | ||
case '\t': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a macroses for this: IS_WS(c)
, IS_CRLF(c)
. I see, why you don't use them, swich-case looks to be a better option to chose execution branch according to the current character, but such approach is not used in the http_parser.c at all. Instead conditions IS_WS(c)
, IS_CRLF(c)
are used one-by-one in order of expectancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved back to IS_WS()
, IS_CRLF()
macros across the patch.
tempesta_fw/http_parser.c
Outdated
__FSM_I_MOVE(I_ContTypeParamOWS); | ||
case '\r': | ||
case '\n': | ||
__FSM_I_MOVE_n(I_ContTypeParamOWS, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why jump to I_ContTypeParamOWS
and not to I_EoL
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lost the exact patch version that this comment was written at. But I believe it is fixed, as I removed at least one such excessive state change. And also none of IS_CRLF()
occurrences move to another state in the current patch version. Instead they rush to completion in one way or another.
tempesta_fw/http_parser.c
Outdated
__FSM_I_MOVE(I_ContTypeMultipart); | ||
case ' ': | ||
case '\t': | ||
__FSM_I_MOVE(I_ContTypeMultipartOWS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, as I understood, I_ContTypeMultipartOWS
exists only because TRY_STR_LAMBDA
tries only start of the token. With #1128 implemented, probably the FSM would become cleaner here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly why this state is needed. However, I think it will be needed anyway, as a fallback for the case there received buffer ends exactly at /form-data
and we have no data beyond that at the moment of parsing. There are similar cases in the main parser where field names are matched: fast and slow paths.
tempesta_fw/http_parser.c
Outdated
} | ||
|
||
__FSM_STATE(I_ContTypeMultipart) { | ||
req->flags |= TFW_HTTP_F_CT_MULTIPART; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to set the flag in I_ContTypeMaybeMultipart
and I_ContTypeMultipartOWS
and avoid creation of a new state just for setting the flag. But it's a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This state was eliminated. Required code was moved upwards to the previous states.
tempesta_fw/http_parser.c
Outdated
|
||
TRY_STR_LAMBDA("boundary=", { | ||
if ((req->flags & TFW_HTTP_F_CT_MULTIPART_HAS_BOUNDARY) | ||
&& (req->flags & TFW_HTTP_F_CT_MULTIPART)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition req->flags & TFW_HTTP_F_CT_MULTIPART
seems to be redundant there. TFW_HTTP_F_CT_MULTIPART_HAS_BOUNDARY
flag can't be set without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's redundant. Removed the check. Thanks.
tempesta_fw/http_parser.c
Outdated
|
||
__FSM_STATE(I_ContTypeParamOther) { | ||
__fsm_n = __data_remain(p); | ||
__fsm_sz = tfw_match_token(p, __fsm_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not __FSM_MATCH_MOVE(token, I_ContTypeParamOther);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to __FSM_I_MATCH_MOVE_fixup()
, as it was the most appropriate here. I believe there was something that I wanted to add, but it turned out to unnecessary.
However, in couple of other places I needed to "unpack" __FSM_I_MATCH_MOVE_fixup()
to add statements inside. Existing macros were not enough.
tempesta_fw/http.c
Outdated
replacement.len = c[0].len + c[1].len + c[2].len + | ||
req->multipart_boundary_raw.len; | ||
|
||
r = tfw_http_msg_hdr_xfrm_str((TfwHttpMsg *)req, &replacement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that it's a safe operation. If original header is longer and it mostly so, we cut off skb segment to comply new data length. Copy operation must be failed, if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now I see that this is not a safe operation.
I've changed the code to make a copy of data of req->multipart_boundary_raw
, so it survives. Now the string data and the buffer it is copied into are guaranteed to be separate, so there should be no issues with memcpy'ing intersecting memory regions.
} | ||
|
||
static int | ||
__req_parse_content_type(TfwHttpMsg *hm, unsigned char *data, size_t len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole function doesn't follow approach used all across http_parser. As a result of this function we got the following:
req->h_tbl[TFW_HTTP_HDR_CONTENT_TYPE]
-> TfwStr with no special limiting by chunks- newly allocated
req->multipart_boundary_raw
which points to boundary value with its own splitting by chunks - newly allocated
req->multipart_boundary
which points to parts of boundary value with its own splitting by chunks - copy operation from
req->multipart_boundary_raw
toreq->multipart_boundary
which is done char-by-char. And actually the same work is done twice: first we scan for escapes in fsm, and the second time inunescape_multipart_boundary()
.
To follow the approach used in the parser module, the function should:
-
Call
__TFW_HTTP_PARSE_SPECHDR_VAL(......, 0)
macro. This macro doesn't fill parsed header, instead parsed header is filled manually chunck-by-chunk, each chunk can have very specific flags. In this case there is no need to allocate additional space forreq->multipart_boundary_raw
or make full symbols scans inunescape_multipart_boundary()
. See__req_parse_cookie()
as an example. -
Parse the header until start of boundary value. Save all parsed chunks into
parser->hdr
(see__FSM_I_MOVE_fixup()
macro) -
Next chunk is start of boundary value, set
req->multipart_boundary_raw->ptr
toTFW_STR_CURR(parser->hdr)
-
If the currently parsed symbol should appear in unescaped
req->multipart_boundary
, add it to current chunk and setTFW_STR_VALUE
flag for it. -
If current character is escape character and should not appear in
req->multipart_boundary
, put it in it's own chunk withoutTFW_STR_VALUE
flag. -
Loop 4-5 until boundary value end.
-
Once boundary value ended, collect
req->multipart_boundary
andreq->multipart_boundary_raw
. Number of chunks forreq->multipart_boundary_raw
can be easily calculated:n = TFW_STR_CURR(parser->hdr) -req->multipart_boundary_raw->ptr. And then run through
req->multipart_boundary_raw->ptrand copy TfwStr headers to
req->multipart_boundaryif
TFW_STR_VALUE` flag set. This copy operation is faster, since it scans by chanks, not by symbols. -
Save the rest of data into
parser->hdr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed instructions!
I've changed the patch to follow them, but rather than using a pointer from TFW_STR_CURR(parser->hdr)
, used a current chunk number. It turned out, that during parser->hdr
growth, parser->hdr.ptr
is reallocated multiple times. At some point, allocator may change parser->hdr.ptr
.
As I understood, unescaped |
39c466a
to
5de5ddd
Compare
The patch now makes deep copies of both |
tempesta_fw/http_parser.c
Outdated
|
||
__FSM_STATE(I_ContTypeMaybeMultipart) { | ||
if (c == ';') { | ||
set_bit(TFW_HTTP_F_CT_MULTIPART, req->flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect usage of set_bit()
function. First parameter must be bit number, not the flag value. Correct expression is to be set_bit(TFW_HTTP_B_CT_MULTIPART, req->flags)
and TFW_HTTP_F_CT_MULTIPART
macro is not needed at all. You can't see errors, because the issue is happening at every test_bit()/set_bit()
operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed both TFW_HTTP_F_CT_MULTIPART
and TFW_HTTP_F_CT_MULTIPART_HAS_BOUNDARY
, and changed to TFW_HTTP_B_*
.
15f6b41
to
add646e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but a small cleanup is required.
@@ -2114,6 +2114,45 @@ tfw_http_set_loc_hdrs(TfwHttpMsg *hm, TfwHttpReq *req) | |||
return 0; | |||
} | |||
|
|||
static int | |||
tfw_http_recreate_content_type_multipart_hdr(TfwHttpReq *req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has a good comment, why the header is recreated from parsed value. Let's keep it with the function, since the backend compatibility issue is not a trivial case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to the function. Text is mostly from the pull request descriptions, with references to the pull request removed.
@@ -2378,6 +2414,14 @@ static TfwCfgSpec tfw_vhost_specs[] = { | |||
.allow_repeat = true, | |||
.allow_reconfig = true, | |||
}, | |||
{ | |||
.name = "http_post_validate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add the new directive to tempesta_fw.conf
, note, that it should be also listed inside vhost
and location
directives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
__FSM_STATE(I_ContType) { | ||
if (req->method != TFW_HTTP_METH_POST) | ||
__FSM_I_JMP(I_EoL); | ||
__FSM_I_JMP(I_ContTypeMediaType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, do we need to strictly analyse the header if directive http_post_validate
is not set? Or wee can simply skip it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, we are selecting vhost and location after all request is parsed completely. So when we get here, there is no enough information to decide yet.
There may be enough information, if we would've seen Host field when we get here. We still need to move vhost and location selection earlier to make this possible though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, forgot about per-vhost directive scope.
__FSM_I_MOVE_fixup(I_ContTypeParamOWS, 1, 0); | ||
} | ||
if (IS_WS(c)) | ||
__FSM_I_MOVE_fixup(I_ContTypeMultipartOWS, 1, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to stay in the same state and remove extra I_ContTypeMultipartOWS
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not possible.
There are minor differences in how I_ContTypeMaybeMultipart and I_ContTypeMultipartOWS behave when none of the checks are true. In other words, for anything not ;
, whitespace, CR, or LF. If we merge these two states, we must either fail (CSTR_NEQ) or move to I_ContTypeOtherSubtype as a default action.
In the first case (CSTR_NEQ) we'll fail to process Content-Type: multipart/form-data2
which is correct according to the description from the RFC. In the latter case we'll accept Content-Type: multipart/form-data aaa; boundary=123
which is not correct according to the description from the RFC.
tempesta_fw/http_parser.c
Outdated
* Requests with multipart/form-data payload should have | ||
* only one boundary parameter. | ||
*/ | ||
if (test_bit(TFW_HTTP_B_CT_MULTIPART_HAS_BOUNDARY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One single function can be used here:
if (__test_and_set_bit(TFW_HTTP_B_CT_MULTIPART_HAS_BOUNDARY, req->flags))
return CSTR_NEQ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use __test_and_set_bit
.
tempesta_fw/http_parser.c
Outdated
if (test_bit(TFW_HTTP_B_CT_MULTIPART_HAS_BOUNDARY, | ||
req->flags)) | ||
return CSTR_NEQ; | ||
set_bit(TFW_HTTP_B_CT_MULTIPART_HAS_BOUNDARY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request is currently parsed and no multiple access may happen, atomic operation shoud be replaced with non-atomic variant __set_bit()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced to __set_bit()
everywhere in the patch.
tempesta_fw/http_parser.c
Outdated
* Content-Type field composing. So to prevent memcpy'ing | ||
* intersecting buffers, we have to make a separate copy. | ||
*/ | ||
__strdup_multipart_boundaries(req); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function can return non-NULL exit code, which is ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check for the return value.
tempesta_fw/http_parser.c
Outdated
} | ||
} | ||
|
||
BUG_ON(ptr_raw != data_raw + req->multipart_boundary_raw.len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use TFW_WARN here and return error code to the caller. If __req_parse_content_type()
has a bug, the currently parsed request will be dropped, bu the server will remain alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, changed to use TFW_WARN. I haven't added any information about lengths, as it seem to be insufficient for debugging anyway.
add646e
to
c00cf2c
Compare
# 'nonidempotent' or 'hdr_add' directives (see the corresponding directives' | ||
# description). | ||
# 'nonidempotent', 'hdr_add' or 'http_post_validate' directives (see the | ||
# corresponding directives' description). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra asterisk after directives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that was a "possessive 's". Since "directives" is plural, "'s" reduces to a sole apostrophe.
return true; | ||
|
||
if (!req->vhost) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this function (as well as the whole tfw_http_adjust_req()
procedure) implies that req->vhost
should be already known at the moment the function is used. If req->vhost == NULL
it means that the function is used in wrong place. So, it should be better to use
if (WARN_ON_ONCE(!req->vhost))
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if req->vhost
is NULL, this function is not called at all. Added WARN_ON_ONCE in #1154.
@@ -229,6 +229,10 @@ enum { | |||
TFW_HTTP_B_CONN_EXTRA, | |||
/* Chunked transfer encoding. */ | |||
TFW_HTTP_B_CHUNKED, | |||
/* Media type is multipart/form-data. */ | |||
TFW_HTTP_B_CT_MULTIPART, | |||
/* Multipart/form-data request have boundary parameter. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have
-> has
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #1154. Thanks!
.deflt = NULL, | ||
.handler = tfw_cfgop_out_http_post_validate, | ||
.allow_none = true, | ||
.allow_repeat = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow_repeat = true
here, while in vhost and location specs (above) - repeat is not allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a copy-paste error. There is no sense in allowing multiple instances of the statement, so the flag was changed to false
in #1154.
* req->vhost is expected to be non-NULL, as it was checked before; * disallow multiple instances of http_post_validate at topmost level; * a typo.
Some application servers parse
Content-Type
header field in a non-standard way, which is used to bypass web application firewalls. For example, PHP checks whenever parameter name contains substringboundary
, and that way matchesxxboundaryxx
and the like.To make life easier for backend servers, this patch makes Tempesta to validate
Content-Type
header field format by parsing it in a strict way. More over it makes possible to recompose its value from parsed data, with any other unrelated parameter dropped. Behavior is controlled byhttp_post_validate
configuration parameter; valid in vhost and location context.(partially addresses #902)