Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http.cookies.SimpleCookie.load() fails to consistently handle malformed cookies #127195

Open
moonsikpark opened this issue Nov 23, 2024 · 2 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@moonsikpark
Copy link
Contributor

moonsikpark commented Nov 23, 2024

Bug report

Bug description:

There are several issues with http.cookies.SimpleCookie.load() that deviate from current browser behavior:

  1. Malformed cookies are not processed at all

Consider the cookie a=b;c=d\x09d;e=f. The e value contains \x09, which is not allowed per RFC 6265, Section 4.1.1.

When this is sent to a browser (Chrome 130), the browser processes all valid cookies and filters out invalid ones:

HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: a=b;
Set-Cookie: c=d	d;
Set-Cookie: e=f

Resulting behavior:

> document.cookie
< 'a=b; e=f'

However, http.cookies.SimpleCookie.load() ignores the entire cookie string:

>>> from http import cookies
>>> C = cookies.SimpleCookie()
>>> C.load("a=b;c=d\x09d;e=f")
>>> C.output()
''
  1. Malformed cookies are inconsistently processed

Consider the cookie a=b;c={"d":"e"};f=g. The c value is invalid per RFC 6265, Section 4.1.1.

Browsers process this cookie without an issue:

HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: a=b;
Set-Cookie: c={"d":"e"};
Set-Cookie: f=g

Resulting behavior:

> document.cookie
< 'a=b; c={"d":"e"}; f=g'

However, http.cookies.SimpleCookie.load() processes only the valid portion before the malformed cookie and stops entirely:

>>> from http import cookies
>>> C = cookies.SimpleCookie()
>>> C.load('a=b; c={"d":"e"}; f=g')
>>> C.output()
'Set-Cookie: a=b'

It seems we should ensure consistent handling by (a) processing all valid cookies and discarding only invalid ones, or
(b) rejecting the entire cookie string if any invalid cookie is present.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

@moonsikpark moonsikpark added the type-bug An unexpected behavior, bug, or error label Nov 23, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Nov 23, 2024
@moonsikpark
Copy link
Contributor Author

After further investigation, it appears the module exhibits inconsistent behavior when encountering unexpected inputs and it is likely due to its regex-based parsing, which assumes very clean input.

cpython/Lib/http/cookies.py

Lines 420 to 437 in a4d4c1e

_CookiePattern = re.compile(r"""
\s* # Optional whitespace at start of cookie
(?P<key> # Start of group 'key'
[""" + _LegalKeyChars + r"""]+? # Any word of at least one letter
) # End of group 'key'
( # Optional group: there may not be a value.
\s*=\s* # Equal Sign
(?P<val> # Start of group 'val'
"(?:[^\\"]|\\.)*" # Any doublequoted string
| # or
\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT # Special case for "expires" attr
| # or
[""" + _LegalValueChars + r"""]* # Any word or empty string
) # End of group 'val'
)? # End of optional value group
\s* # Any number of spaces.
(\s+|;|$) # Ending either at space, semicolon, or EOS.
""", re.ASCII | re.VERBOSE) # re.ASCII may be removed if safe.

If the maintainers are in agreement, I’d like to propose a patch to improve cookie parsing by following these steps:

  1. Split the cookie into parts.
  2. For each part:
    (a) Validate the key. If invalid, issue a warning and skip the part.
    (b) Validate the value. If invalid, issue a warning and skip the part.
  3. Store only valid parts in the cookie jar.

Additionally, I’d like to gather opinions on whether we should allow invalid—but commonly occurring—characters in cookie values, such as those used in JSON. While the RFC advises against accepting these characters, they are widely accepted by major browsers and have become a common practice among web developers.

@picnixz, it seems you triaged my issue. If possible, could you notify the appropriate core maintainers?

@picnixz
Copy link
Contributor

picnixz commented Nov 27, 2024

cc @serhiy-storchaka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants