POST body processing #902

krizhanovsky · 2018-02-04T22:32:16Z

The POST body processing option must be configured per-location and per-vhost by a new configuration option

http_post_validate=N

Where N is maximum content length, 0 by default means process all POST requests w/o any upper limit for the length. By default, if the option is missed in the configuration file, we should not perform any POST validation.

We have to process POST body, at least boundary. Empty and doubling boundary must be correctly handled e.g.:

POST /vuln.php HTTP/1.1
Host: test.com
Connection: close
Content-Type: multipart/form-data; boundary=
Content-Length: 192

--
Content-Disposition: form-data; name="id"

123' or sleep(20)# 
----

POST /vuln.php HTTP/1.1 
Host: test.com
Content-Type: multipart/form-data; boundary=FIRST;
Content-Type: multipart/form-data; boundary=SECOND;
Content-Type: multipart/form-data; boundary=THIRD;

--THIRD 
Content-Disposition: form-data; name=param 

UNION SELECT version()
--THIRD--

POST /vuln.php HTTP/1.1 
Host: test.com
Content-Type: multipart/form-data; xxxboundaryxxx=FIRST; boundary=SECOND;

--FIRST 
Content-Disposition: form-data; name=param 

UNION SELECT version()
--FIRST--

POST /vuln.php HTTP/1.1 
Host: test.com
Content-Type: multipart/form-data; boundary=FOO;

--FOO
Content-Disposition: form-data; name=param1

--FOO
Content-Disposition: form-data; name=param2

UNION SELECT version()
--FOO--

See more cases in https://www.slideshare.net/ssusera0a306/zeronights-2016-a-blow-under-the-belt-how-to-avoid-wafipsdlp-wafipsdlp and https://blog.qualys.com/wp-content/uploads/2012/07/Protocol-Level%20Evasion%20of%20Web%20Application%20Firewalls%20v1.1%20(18%20July%202012).pdf

A new configuration option must be introduced

content_security_mode <strict|transform|log>

The configuration option influences all RFC-undefined HTTP content mutation with probably security issues. strict means just drop a request, i.e. if RFC doesn't allow some parameter to be doubling, then just drop the request and write a warning to log. transform takes the first occurrence and ignores all the following, and write a warning log message. log just write a warning message, as both other modes, and leaves everything as is. Very important list all the cases affected by the option in Wiki, that's very crucial for debugging probably issues with web application. Also add description of the attacks to the Web security wiki with examples and use cases.

Tempesta must not allow multiple same-name parameters in a Content-Disposition part header, doubling.

POSTs can be pretty large and have many parameters. So need a good string search algorithm, BM or an AVX2 matcher. Need to test the algorithm performance on different part sizes (it must be fast for small parts as well). The matching strings can be chunked on network or HTTP transfer encoding layers, e.g.

POST / HTTP/1.1
Host: destinationBucket.s3.amazonaws.com
Content-Type: multipart/form-data; boundary=9431149156168
Transfer-Encoding: chunked

70\r\n
--9431149156168
Content-Disposition: form-data; name="key"

acl
--943
161\r\n
1149156168
Content-Disposition: form-data; name="tagging"

<Tagging><TagSet><Tag><Key>Tag Name</Key><Value>Tag Value</Value></Tag></TagSet></Tagging>
--9431149156168--

There are 2 chunks of size 70 and 161 and the chunks boundary is at the multipart boundary identifier. The search algorithm must store current state on a chunk boundary and continue on the next chunk.

Please check standards and implementations for boundary usage. Apparently it's only for Content-Type: multipart/form-data - web forms which aren't so large and also have many attack vectors. So we should not go into a situation when we're scanning a large blob (e.g. a DVD image) for boundary. In normal case of course. Passing wrong content is addressed by #1119.

Required for #2. Both the issues care only about strict (just drop bad requests) and transform (rewrite a request using the first occurrence to save resources) modes. We do not care about particular end-point application personalities, e.g. differences between parsing by ASP or PHP.

Functional test is described in #843 - please implement the appropriate checkbox running all the relevant POST attacks from both the links above.

Testing

The appropriate test issue for the feature tempesta-tech/tempesta-test#108

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2018-04-04T16:53:25Z

While #628 implements alphabet checking, for now we do not parse POST body and just skip it, so this issue must use alphabet checking from #628 and implement the POST validation.

Please update Custom character sets wiki page for the new POST processing functionality.

i-rinat · 2019-01-16T14:35:05Z

There is a evasion attack based on character escaping in field parameters. According to the RFC 7231, boundary parameter may be a "quoted-string". Something like "aaa\\bbb" should be decoded to aaa\bbb, but not all application servers do that. PHP, for example.

In the request below, PHP will see "Hello, PHP!", while application with a conformant parser will see "Hello, stranger!":

POST /endpoint HTTP/1.1
Host: loc
Content-Type: multipart/form-data; boundary="aaa\\bbb"
Content-Length: 177

--aaa\bbb
Content-Disposition: form-data; name="param"

Hello, stranger!
--aaa\bbb--
--aaa\\bbb
Content-Disposition: form-data; name="param"

Hello, PHP!
--aaa\\bbb--

krizhanovsky · 2019-01-28T23:14:03Z

Parts of the issue are implemented in pull requests #1139 and #1154. The short summary for TODO:

~~http_post_validate must be integer instead of current boolean - we need a limitation for maximum POST scanning.~~ There is http_body_len configuration option, which limits the maximum POST body size. There is no sense to keep two size limits since this might lead to a security hole, when the tail of a POST isn't validated. Let's leave http_post_validate binary. Please write an example and allowed values range in etc/tempesta_fw.conf and description in the Wiki. Mention a linkage (preferably with a configuration example) how http_post_validate is linked with maximum POST size, so that we can limit maximum POST size and always analyze whole POST.
everything around content_security_mode (also keep in mind documentation requirements as above)
~~all the sanitization logic.~~ Let's skip the transform mode. At the moment we rewrite Content-Type and that's enough. Rewriting POST parameters body may cause invisible application errors and rises a lot of ambiguous replacements. Let's postpone this for at least one real user request.

dsvi · 2021-12-16T01:47:22Z

For multipart data encoding see RFC2046
For multipart form data see RFC1867

krizhanovsky · 2021-12-16T22:06:16Z

RFC 1867 is obsoleted by 2854, but it's about text/html and seems use MIME just in example. The real references for us must be RFC 2045, 2046 and maybe 2049.

Following implementations can be used as reference for multipart parser:

The second one deals with nested multipart bodies. I don't think that HTTP ever uses this. It'd be good to check some RFC for this or source code of some mature web project like Django, Node.js or PHP whether they parse nested parts (I checked Ngixn Unit and it seems it leave POST multipart processing for the application logic).

As noted above we should not transform the POST body, just parse the body and log warnings or block a request (in strict mode) it it has parameters pollution or broken structure.

With this task we only need to parse the POST multipart body and (1) validate it and (2) build an internal data structure to represent all the parameters. See for example AWS S3 POST structure.

The data structure representing the multipart body is TBD, but consider following options:

we can keep all the parts in a sorted array by e.g. first 8 bytes of the name parameter. The array can exponentially grow say from size 8. Also consider some hashing if it can be faster than binary search.
it seems there are only 3 possible headers: Content-Type, Content-Disposition and Content-Transfer-Encoding, which seems can just be parsed with current __parse_transfer_encoding() (BTW please update the comment for the function since RFC 2616 is obsolete now). Please use existing parsing functions to parse the headers
Since there are only 3 possible headers, we should keep then in header-id indexed array of size 3 (just as we do now for special headers)
So we should have an array of pointers to data structures each with 3 headers and TfwStr for the body.

Dmitry-Gouriev · 2022-08-29T17:11:58Z

@krizhanovsky RFC 7578: sect. 4.3 and 5.2 explicitely allow multiple form fields with identical field names.

Dmitry-Gouriev · 2022-08-30T10:04:47Z

@krizhanovsky Also Content-Transfer-Encoding is deprecated (RFC 7578 sect 4.7)

krizhanovsky added enhancement security labels Feb 4, 2018

krizhanovsky added this to the 0.7 HTTP/2 milestone Feb 4, 2018

This was referenced Feb 4, 2018

HTTP normalization #2

Open

WAF test suite #843

Open

krizhanovsky assigned vankoven Mar 22, 2018

krizhanovsky modified the milestones: 0.7 HTTP/2, 0.6 KTLS Mar 22, 2018

krizhanovsky mentioned this issue Mar 22, 2018

HTTP message buffering and streaming #498

Open

6 tasks

krizhanovsky mentioned this issue Apr 4, 2018

Custom characters set for URI and HTTP headers #628

Closed

krizhanovsky assigned i-rinat and unassigned vankoven Nov 26, 2018

This was referenced Nov 27, 2018

Protection against malicious file uploads #1119

Open

Kernel-User Space Transport #77

Open

i-rinat mentioned this issue Dec 18, 2018

Sanitize Content-Type for multipart/form-data requests #1139

Merged

krizhanovsky modified the milestones: 0.6 Tempesta TLS, 0.7 HTTP/2 & TLS 1.3, 0.8 HTTP/2 & TLS 1.3, 0.7 TLS v0.3 Jan 28, 2019

krizhanovsky added the crucial label Jun 8, 2019

krizhanovsky unassigned i-rinat Nov 20, 2019

krizhanovsky modified the milestones: 0.7 HTTP/2, 0.8 TLS 1.3 & TDBv0.2 Mar 5, 2020

krizhanovsky mentioned this issue Mar 5, 2020

Server PUSH #1194

Open

krizhanovsky mentioned this issue Dec 5, 2021

Process HTTP abs_path and query separately #1276

Open

4 tasks

krizhanovsky modified the milestones: 0.8 TLS 1.3 & TDBv0.2 - Beta, 0.7 HTTP/2 & TLS performance Dec 5, 2021

krizhanovsky assigned dsvi Dec 16, 2021

krizhanovsky assigned Dmitry-Gouriev and unassigned dsvi and Dmitry-Gouriev Dec 22, 2021

krizhanovsky self-assigned this Jan 3, 2022

krizhanovsky modified the milestones: 0.7 - HTTP/2, fast in-kernel TLS, 0.9 - TDB Jan 3, 2022

krizhanovsky modified the milestones: 0.9 - POST processing, 1.1: TBD Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POST body processing #902

POST body processing #902

krizhanovsky commented Feb 4, 2018 •

edited

Loading

krizhanovsky commented Apr 4, 2018 •

edited

Loading

i-rinat commented Jan 16, 2019

krizhanovsky commented Jan 28, 2019 •

edited

Loading

dsvi commented Dec 16, 2021 •

edited

Loading

krizhanovsky commented Dec 16, 2021 •

edited

Loading

Dmitry-Gouriev commented Aug 29, 2022

Dmitry-Gouriev commented Aug 30, 2022

POST body processing #902

POST body processing #902

Comments

krizhanovsky commented Feb 4, 2018 • edited Loading

Testing

krizhanovsky commented Apr 4, 2018 • edited Loading

i-rinat commented Jan 16, 2019

krizhanovsky commented Jan 28, 2019 • edited Loading

dsvi commented Dec 16, 2021 • edited Loading

krizhanovsky commented Dec 16, 2021 • edited Loading

Dmitry-Gouriev commented Aug 29, 2022

Dmitry-Gouriev commented Aug 30, 2022

krizhanovsky commented Feb 4, 2018 •

edited

Loading

krizhanovsky commented Apr 4, 2018 •

edited

Loading

krizhanovsky commented Jan 28, 2019 •

edited

Loading

dsvi commented Dec 16, 2021 •

edited

Loading

krizhanovsky commented Dec 16, 2021 •

edited

Loading