diff --git a/Makefile b/Makefile index 0f73b3f..70123ab 100644 --- a/Makefile +++ b/Makefile @@ -21,7 +21,11 @@ coverage: venv .PHONY: docs docs: venv - cd docs; ../$(VENV)/bin/sphinx-build -M html . ../build/docs + $(VENV)/bin/sphinx-build -M html docs build/docs + +.PHONY: watchdocs +watchdocs: venv + $(VENV)/bin/sphinx-autobuild -a --watch . -b html docs build/docs/watch/ upload: build $(VENV)/bin/python3 -m twine upload --skip-existing dist/multipart-* diff --git a/README.rst b/README.rst index 7be9eaa..0656456 100644 --- a/README.rst +++ b/README.rst @@ -21,172 +21,43 @@ Python multipart/form-data parser .. _SansIO: https://sans-io.readthedocs.io/ .. _asyncio: https://docs.python.org/3/library/asyncio.html -This module provides a fast incremental non-blocking parser for RFC7578_ -``multipart/form-data``, as well as blocking alternatives for easier use in -WSGI_ or CGI applications: - -* ``PushMultipartParser``: Incremental and non-blocking (SansIO_) parser - suitable for ASGI_, asyncio_ and other time or memory constrained environments. -* ``MultipartParser``: Streaming parser that yields memory- or disk-buffered - ``MultipartPart`` instances. -* ``parse_form_data(environ)`` and ``is_form_request(environ)``: Convenience - functions for WSGI_ applications with support for both ``multipart/form-data`` - and ``application/x-www-form-urlencoded`` form submissions. - - -Installation -============ - -``pip install multipart`` +This module provides a fast incremental non-blocking parser for +``multipart/form-data`` [HTML5_, RFC7578_], as well as blocking alternatives for +easier use in WSGI_ or CGI applications: +* **PushMultipartParser**: Fast SansIO_ (incremental, non-blocking) parser suitable + for ASGI_, asyncio_ and other IO, time or memory constrained environments. +* **MultipartParser**: Streaming parser that reads from a byte stream and yields + memory- or disk-buffered `MultipartPart` instances. +* **WSGI Helper**: High-level functions and containers for WSGI_ or CGI applications with support + for both `multipart` and `urlencoded` form submissions. Features ======== * Pure python single file module with no dependencies. -* Well tested with inputs from actual browsers and HTTP clients. 100% test coverage. -* Parses multiple GB/s on modern hardware (see `benchmarks `_). -* Quickly rejects malicious or broken inputs and emits useful error messages. -* Enforces configurable memory and disk resource limits to prevent DoS attacks. - -**Scope:** This parser implements ``multipart/form-data`` as defined by HTML5_ -and RFC7578_ and aims to support all browsers or HTTP clients in use today. -Legacy browsers are supported to some degree, but only if those workarounds do -not impact performance or security. In detail this means: - -* Just ``multipart/form-data``, not suitable for email parsing. -* No ``multipart/mixed`` support (deprecated in RFC7578_). -* No ``base64`` or ``quoted-printable`` transfer encoding (deprecated in RFC7578_). -* No ``encoded-word`` or ``name=_charset_`` encoding markers (deprecated in HTML5_). -* No support for clearly broken clients (e.g. invalid line breaks or headers). - -Usage and Examples -================== - -Here are some basic examples for the most common use cases. There are more -parameters and features available than shown here, so check out the docstrings -(or your IDEs built-in help) to get a full picture. - - -Helper function for WSGI or CGI -------------------------------- - -For WSGI application developers we strongly suggest using the ``parse_form_data`` -helper function. It accepts a WSGI ``environ`` dictionary and parses both types -of form submission (``multipart/form-data`` and ``application/x-www-form-urlencoded``) -based on the actual content type of the request. You'll get two ``MultiDict`` -instances in return, one for text fields and the other for file uploads: - -.. code-block:: python - - from multipart import parse_form_data, is_form_request - - def wsgi(environ, start_response): - if is_form_request(environ): - forms, files = parse_form_data(environ) - - title = forms["title"] # type: string - upload = files["upload"] # type: MultipartPart - upload.save_as(...) - -Note that form fields that are too large to fit into memory will end up as -``MultipartPart`` instances in the ``files`` dict instead. This is to protect -your app from running out of memory or crashing. ``MultipartPart`` instances are -buffered to temporary files on disk if they exceed a certain size. The default -limits should be fine for most use cases, but can be configured if you need to. -See ``MultipartParser`` for details. - -Flask, Bottle & Co -^^^^^^^^^^^^^^^^^^ - -Most WSGI web frameworks already have multipart functionality built in, but -you may still get better throughput for large files (or better limits control) -by switching parsers: - -.. code-block:: python +* Optimized for both blocking and non-blocking applications. +* 100% test coverage with test data from actual browsers and HTTP clients. +* High throughput and low latency (see `benchmarks `_). +* Predictable memory and disk resource consumption via fine grained limits. +* Strict mode: Spent less time parsing malicious or broken inputs. + +Scope and compatibility +======================= +All parsers in this module implement ``multipart/form-data`` as defined by HTML5_ +and RFC7578_, supporting all modern browsers or HTTP clients in use today. +Legacy browsers (e.g. IE6) are supported to some degree, but only if the +required workarounds do not impact performance or security. - forms, files = multipart.parse_form_data(flask.request.environ) - -Legacy CGI -^^^^^^^^^^ - -If you are in the unfortunate position to have to rely on CGI, but can't use -``cgi.FieldStorage`` anymore, it's possible to build a minimal WSGI environment -from a CGI environment and use that with ``parse_form_data``. This is not a real -WSGI environment, but it contains enough information for ``parse_form_data`` -to do its job. Do not forget to add proper error handling. - -.. code-block:: python - - import sys, os, multipart - - environ = dict(os.environ.items()) - environ['wsgi.input'] = sys.stdin.buffer - forms, files = multipart.parse_form_data(environ) - - -Stream parser: ``MultipartParser`` ----------------------------------- - -The ``parse_form_data`` helper may be convenient, but it expects a WSGI -environment and parses the entire request in one go before it returns any -results. Using ``MultipartParser`` directly gives you more control and also -allows you to process ``MultipartPart`` instances as soon as they arrive: - -.. code-block:: python - - from multipart import parse_options_header, MultipartParser - - def wsgi(environ, start_response): - content_type, params = parse_options_header(environ["CONTENT_TYPE"]) - - if content_type == "multipart/form-data": - stream = environ["wsgi.input"] - boundary = params["boundary"] - charset = params.get("charset", "utf8") - - parser = MultipartParser(stream, boundary, charset) - for part in parser: - if part.filename: - print(f"{part.name}: File upload ({part.size} bytes)") - part.save_as(...) - elif part.size < 1024: - print(f"{part.name}: Text field ({part.value!r})") - else: - print(f"{part.name}: Test field, but too big to print :/") - - -Non-blocking parser: ``PushMultipartParser`` --------------------------------------------- - -The ``MultipartParser`` handles IO and file buffering for you, but relies on -blocking APIs. If you need absolute control over the parsing process and want to -avoid blocking IO at all cost, then have a look at ``PushMultipartParser``, the -low-level non-blocking incremental ``multipart/form-data`` parser that powers -all the other parsers in this library: - -.. code-block:: python - - from multipart import PushMultipartParser, MultipartSegment - - async def process_multipart(reader: asyncio.StreamReader, boundary: str): - with PushMultipartParser(boundary) as parser: - while not parser.closed: +Installation +============ - chunk = await reader.read(1024*64) - for result in parser.parse(chunk): +``pip install multipart`` - if isinstance(result, MultipartSegment): - print(f"== Start of segment: {result.name}") - if result.filename: - print(f"== Client-side filename: {result.filename}") - for header, value in result.headerlist: - print(f"{header}: {value}") - elif result: # Result is a non-empty bytearray - print(f"[received {len(result)} bytes of data]") - else: # Result is None - print(f"== End of segment") +Documentation +============= +Examples and API documentation can be found at: https://multipart.readthedocs.io/ License ======= diff --git a/docs/api.rst b/docs/api.rst index bb48ee6..4a0f456 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -4,6 +4,8 @@ API Reference .. py:currentmodule:: multipart +.. automodule:: multipart + SansIO Parser ============= @@ -12,12 +14,16 @@ SansIO Parser .. autoclass:: MultipartSegment :members: + :special-members: __getitem__ Stream Parser ============= + .. autoclass:: MultipartParser :members: + :special-members: __iter__, __getitem__ + .. autoclass:: MultipartPart :members: @@ -28,6 +34,9 @@ WSGI Helper .. autofunction:: is_form_request .. autofunction:: parse_form_data +.. autoclass:: MultiDict + :members: + Header utils ============ diff --git a/docs/index.rst b/docs/index.rst index 69c29e9..861d290 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,10 +1,77 @@ .. py:currentmodule:: multipart -.. include:: ../README.rst + +================================= +Python multipart/form-data parser +================================= + +.. image:: https://github.com/defnull/multipart/actions/workflows/test.yaml/badge.svg + :target: https://github.com/defnull/multipart/actions/workflows/test.yaml + :alt: Tests Status + +.. image:: https://img.shields.io/pypi/v/multipart.svg + :target: https://pypi.python.org/pypi/multipart/ + :alt: Latest Version + +.. image:: https://img.shields.io/pypi/l/multipart.svg + :target: https://pypi.python.org/pypi/multipart/ + :alt: License + +.. _HTML5: https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data +.. _RFC7578: https://www.rfc-editor.org/rfc/rfc7578 +.. _WSGI: https://peps.python.org/pep-3333 +.. _ASGI: https://asgi.readthedocs.io/en/latest/ +.. _SansIO: https://sans-io.readthedocs.io/ +.. _asyncio: https://docs.python.org/3/library/asyncio.html + +This module provides a fast incremental non-blocking parser for +``multipart/form-data`` [HTML5_, RFC7578_], as well as blocking alternatives for +easier use in WSGI_ or CGI applications: + +* :ref:`push-example`: Fast SansIO_ (incremental, non-blocking) parser suitable + for ASGI_, asyncio_ and other IO, time or memory constrained environments. +* :ref:`stream-example`: Blocking parser that reads from a stream and yields + memory- or disk-buffered :class:`MultipartPart` instances. +* :ref:`wsgi-example`: High-level functions and containers for WSGI_ or CGI + applications with support for both `multipart` and `urlencoded` form submissions. + +Features and Scope +================== + +* Pure python single file module with no dependencies. +* Optimized for both blocking and non-blocking applications. +* 100% test coverage with test data from actual browsers and HTTP clients. +* High throughput and low latency (see `benchmarks `_). +* Predictable memory and disk resource consumption via fine grained limits. +* Strict mode: Spent less time parsing malicious or broken inputs. + +**Scope:** All parsers in this module implement ``multipart/form-data`` as defined by HTML5_ +and RFC7578_, supporting all modern browsers or HTTP clients in use today. +Legacy browsers (e.g. IE6) are supported to some degree, but only if the +required workarounds do not impact performance or security. In detail this means: + +* Just ``multipart/form-data``, not suitable for email parsing. +* No ``multipart/mixed`` support (deprecated in RFC7578_). +* No ``base64`` or ``quoted-printable`` transfer encoding (deprecated in RFC7578_). +* No ``encoded-word`` or ``name=_charset_`` encoding markers (deprecated in HTML5_). +* No support for clearly broken clients (e.g. invalid line breaks or headers). + +Installation +============ + +``pip install multipart`` + +Table of Content +================ .. toctree:: :maxdepth: 2 - :hidden: Home + usage api - changelog \ No newline at end of file + changelog + +License +======= + +.. include:: ../LICENSE diff --git a/docs/usage.rst b/docs/usage.rst new file mode 100644 index 0000000..7389886 --- /dev/null +++ b/docs/usage.rst @@ -0,0 +1,172 @@ +.. py:currentmodule:: multipart + +.. _HTML5: https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data +.. _RFC7578: https://www.rfc-editor.org/rfc/rfc7578 +.. _WSGI: https://peps.python.org/pep-3333 +.. _ASGI: https://asgi.readthedocs.io/en/latest/ +.. _SansIO: https://sans-io.readthedocs.io/ +.. _asyncio: https://docs.python.org/3/library/asyncio.html + +================== +Usage and Examples +================== + +Here are some basic examples for the most common use cases. There are more +parameters and features available than shown here, so check out the docstrings +(or your IDEs built-in help) to get a full picture. + + +.. _wsgi-example: + +WSGI helper +=========== + +The WSGI helper functions :func:`is_form_request` and :func:`parse_form_data` +accept a `WSGI environ` dictionary and support both types of form submission +(``multipart/form-data`` and ``application/x-www-form-urlencoded``) at once. +You'll get two fully populated :class:`MultiDict` instances in return, one for +text fields and the other for file uploads: + +.. code-block:: python + + from multipart import parse_form_data, is_form_request + + def wsgi(environ, start_response): + if is_form_request(environ): + forms, files = parse_form_data(environ) + + title = forms["title"] # type: string + upload = files["upload"] # type: MultipartPart + upload.save_as(...) + +Note that form fields that are too large to fit into memory will end up as +:class:`MultipartPart` instances in the :class:`files` dict instead. This is to protect +your app from running out of memory or crashing. :class:`MultipartPart` instances are +buffered to temporary files on disk if they exceed a certain size. The default +limits should be fine for most use cases, but can be configured if you need to. +See :class:`MultipartParser` for configurable limits. + +Flask, Bottle & Co +------------------ + +Most WSGI web frameworks already have multipart functionality built in, but +you may still get better throughput for large files (or better limits control) +by switching parsers: + +.. code-block:: python + + forms, files = multipart.parse_form_data(flask.request.environ) + +Legacy CGI +---------- + +If you are in the unfortunate position to have to rely on CGI, but can't use +:class:`cgi.FieldStorage` anymore, it's possible to build a minimal WSGI environment +from a CGI environment and use that with :func:`parse_form_data`. This is not a real +WSGI environment, but it contains enough information for :func:`parse_form_data` +to do its job. Do not forget to add proper error handling. + +.. code-block:: python + + import sys, os, multipart + + environ = dict(os.environ.items()) + environ['wsgi.input'] = sys.stdin.buffer + forms, files = multipart.parse_form_data(environ) + + +.. _stream-example: + +Streaming parser +================================== + +The WSGI helper functions may be convenient, but they expect a WSGI environment +and parse the entire request in one go. If you need more control, you can use +:class:`MultipartParser` directly. This streaming parser reads from any blocking +byte stream (e.g. ``environ["wsgi.input"]``) and emits :class:`MultipartPart` +instances that are either memory- or disk-buffered debending on size. If used as +an iterator, the parser will yield parts as soon as they are complete and not +wait for the entire request to be parsed. This allows applications to process +parts (or abort the request) before the request is fully transmitted. + +.. code-block:: python + + from multipart import parse_options_header, MultipartParser + + def wsgi(environ, start_response): + content_type, options = parse_options_header(environ["CONTENT_TYPE"]) + + if content_type == "multipart/form-data" and 'boundary' in options: + stream = environ["wsgi.input"] + boundary = options["boundary"] + parser = MultipartParser(stream, boundary) + + for part in parser: + if part.filename: + print(f"{part.name}: File upload ({part.size} bytes)") + part.save_as(...) + elif part.size < 1024: + print(f"{part.name}: Text field ({part.value!r})") + else: + print(f"{part.name}: Test field, but too big to print :/") + + # Free up resources after use + for part in parser.parts(): + part.close() + +Results are cached, so you can iterate or call +:meth:`MultipartParser.get` or :meth:`MultipartParser.parts` multiple times +without triggering any extra work. Do not forget to :meth:`close ` +all parts after use to free up resources and avoid :exc:`ResourceWarnings`. +Framework developers may want to add logic that automatically frees up resources +after the request ended. + +.. _push-example: + +SansIO parser +========================================= + +All parsers in this library are based on :class:`PushMultipartParser`, a fast +and secure SansIO_ (non-blocking, incremental) parser targeted at framework or +application developers that need a high level of control. `SansIO` means that +the parser itself does not make any assumptions about the IO or concurrency model +and can be used in any environment, including coroutines, greenlets, callbacks +or threads. But it also means that you have to deal with IO yourself. Here is +an example that shows how it can be used in an asyncio_ based application: + +.. code-block:: python + + from multipart import PushMultipartParser, MultipartSegment + + async def process_multipart(reader: asyncio.StreamReader, boundary: str): + + with PushMultipartParser(boundary) as parser: + while not parser.closed: + chunk = await reader.read(1024*64) + + for result in parser.parse(chunk): + if isinstance(result, MultipartSegment): + print(f"== Start of segment: {result.name}") + if result.filename: + print(f"== Client-side filename: {result.filename}") + for header, value in result.headerlist: + print(f"{header}: {value}") + elif result: # Non-empty bytearray + print(f"[received {len(result)} bytes of data]") + else: # None + print(f"== End of segment") + +Once the parser is set up, you feed it bits of data and receive zero or more +result events in return. For each part in a valid multipart stream, the parser +will emit a single :class:`MultipartSegment` instance, followed by zero or more +non-empty content chunks (:class:`bytearray`), followed by a single :data:`None` +to signal the end of the current part. The generator returned by +:meth:`PushMultipartParser.parse` will stop if more data is needed, or raise +:exc:`MultipartError` if it encounters invalid data. Once the parser detects the +end of the multipart stream, :attr:`PushMultipartParser.closed` will be true and +you can stop parsing. + +Note that the parser is a context manager. This ensures that the parser actually +reached the end of input and found the final multipart delimiter. Calling +:meth:`PushMultipartParser.close` or exiting the context manager will raise +:exc:`MultipartError` if the parser is still expecting more data. diff --git a/multipart.py b/multipart.py index 8eac62e..f19c004 100644 --- a/multipart.py +++ b/multipart.py @@ -26,6 +26,7 @@ from collections.abc import MutableMapping as DictMixin import tempfile import functools +from math import inf ## @@ -111,22 +112,26 @@ def __setitem__(self, key, value): self.append(key, value) def append(self, key, value): + """ Add an additional value to a key. """ self.dict.setdefault(key, []).append(value) def replace(self, key, value): + """ Replace all values for a key with a single value. """ self.dict[key] = [value] def getall(self, key): + """ Return a list with all values for a key. The list may be empty. """ return self.dict.get(key) or [] def get(self, key, default=None, index=-1): + # Not documented because it's likely to change. if key not in self.dict and default != KeyError: return [default][index] return self.dict[key][index] def iterallitems(self): - """ Yield (key, value) keys, but for all values. """ + """ Yield (key, value) pairs with repeating keys for each value. """ for key, values in self.dict.items(): for value in values: yield key, value @@ -283,8 +288,8 @@ def __init__( content_length=-1, max_header_size=4096 + 128, # 4KB should be enough for everyone max_header_count=8, # RFC 7578 allows just 3 - max_segment_size=2**64, # Practically unlimited - max_segment_count=2**64, # Practically unlimited + max_segment_size=inf, # unlimited + max_segment_count=inf, # unlimited header_charset="utf8", strict=False, ): @@ -299,10 +304,10 @@ def __init__( limit will trigger a :exc:`ParserLimitReached` exception. :param boundary: The multipart boundary as found in the Content-Type header. - :param content_length: Maximum number of bytes to parse, or -1 for no limit. - :param max_header_size: Maximum size of a single header (name+value). + :param content_length: Expected input size in bytes, or -1 if unknown. + :param max_header_size: Maximum length of a single header line (name and value). :param max_header_count: Maximum number of headers per segment. - :param max_segment_size: Maximum size of a single segment. + :param max_segment_size: Maximum size of a single segment body. :param max_segment_count: Maximum number of segments. :param header_charset: Charset for header names and values. :param strict: Enables additional format and sanity checks. @@ -325,10 +330,11 @@ def __init__( self._current = None self._state = _PREAMBLE - #: True if the parser was closed. + #: True if the parser reached the end of the multipart stream, stopped + #: parsing due to an :attr:`error`, or :meth:`` was called. self.closed = False - #: The last error - self.error = None + #: A :exc:`MultipartError` instance if parsing failed. + self.error: Optional[MultipartError] = None def __enter__(self): return self @@ -346,7 +352,7 @@ def parse( of :class:`MultipartSegment` with all headers already present, followed by zero or more non-empty `bytearray` instances containing parts of the segment body, followed by a single `None` signaling the - end of the segment. + end of the current segment. The returned iterator will stop if more data is required or if the end of the multipart stream was detected. The iterator must be fully consumed @@ -499,7 +505,7 @@ def close(self, check_complete=True): """ Close this parser if not already closed. - :param check_complete: Raise MultipartError if the parser did not + :param check_complete: Raise :exc:`ParserError` if the parser did not reach the end of the multipart stream yet. """ @@ -515,33 +521,35 @@ def close(self, check_complete=True): class MultipartSegment: + """ A :class:`MultipartSegment` represents the header section of a single + multipart part and provides convenient access to part headers and other + details (e.g. :attr:`name` and :attr:`filename`). Each segment also + tracks its own content :attr:`size` while the :class:`PushMultipartParser` + processes more data, and is marked as :attr:`complete` as soon as the + next multipart border is found. Segments do not store or buffer any of + their content data, though. + """ #: List of headers as name/value pairs with normalized (Title-Case) names. headerlist: List[Tuple[str, str]] - #: The 'name' option of the Content-Disposition header. Always a string, + #: The 'name' option of the `Content-Disposition` header. Always a string, #: but may be empty. name: str - #: The optional 'filename' option of the Content-Disposition header. + #: The optional 'filename' option of the `Content-Disposition` header. filename: Optional[str] - #: The Content-Type of this segment, if the header was present. - #: Not the entire header, just the actual content type without options. + #: The cleaned up `Content-Type` segment header, if present. The value is + #: lower-cased and header options (e.g. charset) are removed. content_type: Optional[str] - #: The 'charset' option of the Content-Type header, if present. + #: The 'charset' option of the `Content-Type` header, if present. charset: Optional[str] #: Segment body size (so far). Will be updated during parsing. size: int - #: If true, the last chunk of segment body data was parsed and the size - #: value is final. + #: If true, the segment content was fully parsed and the size value is final. complete: bool def __init__(self, parser: PushMultipartParser): - """ MultipartSegments are created by the PushMultipartParser and - represent a single multipart segment, but do not store or buffer any - of the content. The parser will emit MultipartSegments with a fully - populated headerlist and derived information (name, filename, ...) can - be accessed. - """ + """ Private constructor, used by :class:`PushMultipartParser` """ self._parser = parser if parser._fieldcount+1 > parser.max_segment_count: @@ -632,7 +640,7 @@ def header(self, name: str, default=None): return default def __getitem__(self, name): - """Return a header value if present, or raise KeyError.""" + """Return a header value if present, or raise :exc:`KeyError`.""" return self.header(name, KeyError) @@ -653,38 +661,38 @@ def __init__( header_limit=8, headersize_limit=1024 * 4 + 128, # 4KB part_limit=128, - partsize_limit=2**64, # practically unlimited + partsize_limit=inf, # unlimited spool_limit=1024 * 64, # Keep fields up to 64KB in memory memory_limit=1024 * 64 * 128, # spool_limit * part_limit - disk_limit=2**64, # practically unlimited + disk_limit=inf, # unlimited mem_limit=0, memfile_limit=0, ): - """A parser that reads from a multipart/form-data encoded byte stream + """A parser that reads from a `multipart/form-data` encoded byte stream and yields :class:`MultipartPart` instances. - The parse itself is an iterator and will read and parse data on - demand. results are cached, so once fully parsed, it can be iterated - over again. + The parse acts as a lazy iterator and will only read and parse as much + data as needed to return the next part. Results are cached and the same + part can be requested multiple times without extra cost. - :param stream: A readable byte stream. Must implement ``.read(size)``. + :param stream: A readable byte stream or any other object that implements + a :meth:`read(size) ` method. :param boundary: The multipart boundary as found in the Content-Type header. - :param content_length: The maximum number of bytes to read. + :param charset: Default charset for headers and text fields. :param strict: Enables additional format and sanity checks. - :param buffer_size: Size of chunks read from the source stream - - :param header_limit: Maximum number of headers per segment - :param headersize_limit: Maximum size of a segment header line - :param part_limit: Maximum number of segments to parse - :param partsize_limit: Maximum size of a segment body - :param spool_limit: Segments up to this size are buffered in memory, - larger segments are buffered in temporary files on disk. - :param memory_limit: Maximum size of all memory-buffered segments. - :param disk_limit: Maximum size of all disk-buffered segments - - :param memfile_limit: Deprecated alias for `spool_limit`. - :param mem_limit: Deprecated alias for `memory_limit`. + :param buffer_size: Chunk size when reading from the source stream. + + :param header_limit: Maximum number of headers per part. + :param headersize_limit: Maximum length of a single header line (name and value). + :param part_limit: Maximum number of parts. + :param partsize_limit: Maximum content size of a single parts. + :param spool_limit: Parts up to this size are buffered in memory and count + towards `memory_limit`. Larger parts are spooled to temporary files on + disk and count towards `disk_limit`. + :param memory_limit: Maximum size of all memory-buffered parts. Should + be smaller than ``spool_limit * part_limit`` to have an effect. + :param disk_limit: Maximum size of all disk-buffered parts. """ self.stream = stream self.boundary = boundary @@ -704,7 +712,8 @@ def __init__( self._part_iter = None def __iter__(self): - """Iterate over the parts of the multipart message.""" + """ Parse the multipart stream and yield :class:`MultipartPart` + instances as soon as they are available. """ if not self._part_iter: self._part_iter = self._iterparse() @@ -716,11 +725,13 @@ def __iter__(self): yield part def parts(self): - """Returns a list with all parts of the multipart message.""" + """ Parse the entire multipart stream and return all :class:`MultipartPart` + instances as a list. """ return list(self) def get(self, name, default=None): - """Return the first part with that name or a default value.""" + """ Return the first part with a given name, or the default value if no + matching part exists. """ for part in self: if name == part.name: return part @@ -728,7 +739,7 @@ def get(self, name, default=None): return default def get_all(self, name): - """Return a list of parts with that name.""" + """ Return all parts with the given name. """ return [p for p in self if p.name == name] def _iterparse(self): @@ -783,6 +794,12 @@ def _iterparse(self): class MultipartPart(object): + """ A :class:`MultipartPart` represents a fully parsed multipart part + and provides convenient access to part headers and other details (e.g. + :attr:`name` and :attr:`filename`) as well as its memory- or disk-buffered + binary or text content. + """ + def __init__( self, buffer_size=2**16, @@ -790,14 +807,22 @@ def __init__( charset="utf8", segment: "MultipartSegment" = None, ): + + """ Private constructor, used by :class:`MultipartParser` """ + self._segment = segment - #: A file-like object holding the fields content + #: A file-like buffer holding the parts binary content, or None if this + #: part was :meth:`closed `. self.file = BytesIO() + #: Part size in bytes. self.size = 0 + #: Part name. self.name = segment.name + #: Part filename (if defined). self.filename = segment.filename - #: Charset as defined in the segment header, or the parser default charset + #: Charset as defined in the part header, or the parser default charset. self.charset = segment.charset or charset + #: All part headers as a list of (name, value) pairs. self.headerlist = segment.headerlist self.memfile_limit = memfile_limit @@ -805,14 +830,20 @@ def __init__( @_cached_property def headers(self) -> Headers: + """ A convenient dict-like holding all part headers. """ return Headers(self._segment.headerlist) @_cached_property def disposition(self) -> str: + """ The value of the `Content-Disposition` part header. """ return self._segment.header("Content-Disposition") @_cached_property def content_type(self) -> str: + """ Cleaned up content type provided for this part, or a sensible + default (`application/octet-stream` for files and `text/plain` for + text fields). + """ return self._segment.content_type or ( "application/octet-stream" if self.filename else "text/plain") @@ -833,14 +864,16 @@ def _mark_complete(self): self.file.seek(0) def is_buffered(self): - """Return true if the data is fully buffered in memory.""" + """ Return true if :attr:`file` is memory-buffered, or false if the part + was larger than the `spool_limit` and content was spooled to + temporary files on disk. """ return isinstance(self.file, BytesIO) @property def value(self): - """Return the entire payload as decoded text. + """Return the entire payload as a decoded text string. - Warning, this may consume a lot of memory, check size first. + Warning, this may consume a lot of memory, check :attr:`size` first. """ return self.raw.decode(self.charset) @@ -849,7 +882,7 @@ def value(self): def raw(self): """Return the entire payload as a raw byte string. - Warning, this may consume a lot of memory, check size first. + Warning, this may consume a lot of memory, check :attr:`size` first. """ pos = self.file.tell() self.file.seek(0) @@ -859,7 +892,10 @@ def raw(self): return val def save_as(self, path): - """Save a copy of this part to `path` and return its size.""" + """ Save a copy of this part to `path` and return the number of bytes + written. + """ + with open(path, "wb") as fp: pos = self.file.tell() try: @@ -870,6 +906,7 @@ def save_as(self, path): return size def close(self): + """ Close :attr:`file` and set it to `None` to free up resources. """ if self.file: self.file.close() self.file = False