Skip to content

Commit d44e7a2

Browse files
committed
docs: Lots of work on docs and docstrings
Non-doc changes: Improve typing and use math.inf insted of a large hard-coded number as default parameters in several places.
1 parent 2a269d2 commit d44e7a2

File tree

6 files changed

+377
-211
lines changed

6 files changed

+377
-211
lines changed

Makefile

+5-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,11 @@ coverage: venv
2121

2222
.PHONY: docs
2323
docs: venv
24-
cd docs; ../$(VENV)/bin/sphinx-build -M html . ../build/docs
24+
$(VENV)/bin/sphinx-build -M html docs build/docs
25+
26+
.PHONY: watchdocs
27+
watchdocs: venv
28+
$(VENV)/bin/sphinx-autobuild -a --watch . -b html docs build/docs/watch/
2529

2630
upload: build
2731
$(VENV)/bin/python3 -m twine upload --skip-existing dist/multipart-*

README.rst

+27-150
Original file line numberDiff line numberDiff line change
@@ -21,172 +21,49 @@ Python multipart/form-data parser
2121
.. _SansIO: https://sans-io.readthedocs.io/
2222
.. _asyncio: https://docs.python.org/3/library/asyncio.html
2323

24-
This module provides a fast incremental non-blocking parser for RFC7578_
25-
``multipart/form-data``, as well as blocking alternatives for easier use in
26-
WSGI_ or CGI applications:
27-
28-
* ``PushMultipartParser``: Incremental and non-blocking (SansIO_) parser
29-
suitable for ASGI_, asyncio_ and other time or memory constrained environments.
30-
* ``MultipartParser``: Streaming parser that yields memory- or disk-buffered
31-
``MultipartPart`` instances.
32-
* ``parse_form_data(environ)`` and ``is_form_request(environ)``: Convenience
33-
functions for WSGI_ applications with support for both ``multipart/form-data``
34-
and ``application/x-www-form-urlencoded`` form submissions.
35-
36-
37-
Installation
38-
============
39-
40-
``pip install multipart``
24+
This module provides a fast incremental non-blocking parser for
25+
``multipart/form-data`` [HTML5_, RFC7578_], as well as blocking alternatives for
26+
easier use in WSGI_ or CGI applications:
4127

28+
* **PushMultipartParser**: Fast SansIO_ (incremental, non-blocking) parser suitable
29+
for ASGI_, asyncio_ and other IO, time or memory constrained environments.
30+
* **MultipartParser**: Streaming parser that reads from a byte stream and yields
31+
memory- or disk-buffered `MultipartPart` instances.
32+
* **WSGI Helper**: High-level functions and containers for WSGI_ or CGI applications with support
33+
for both `multipart` and `urlencoded` form submissions.
4234

4335
Features
4436
========
4537

4638
* Pure python single file module with no dependencies.
47-
* Well tested with inputs from actual browsers and HTTP clients. 100% test coverage.
48-
* Parses multiple GB/s on modern hardware (see `benchmarks <https://github.com/defnull/multipart_bench>`_).
49-
* Quickly rejects malicious or broken inputs and emits useful error messages.
50-
* Enforces configurable memory and disk resource limits to prevent DoS attacks.
51-
52-
**Scope:** This parser implements ``multipart/form-data`` as defined by HTML5_
53-
and RFC7578_ and aims to support all browsers or HTTP clients in use today.
54-
Legacy browsers are supported to some degree, but only if those workarounds do
55-
not impact performance or security. In detail this means:
39+
* Optimized for both blocking and non-blocking applications.
40+
* 100% test coverage with test data from actual browsers and HTTP clients.
41+
* High throughput and low latency (see `benchmarks <https://github.com/defnull/multipart_bench>`_).
42+
* Predictable memory and disk resource consumption via fine grained limits.
43+
* Strict mode: Spent less time parsing malicious or broken inputs.
44+
45+
Scope and compatibility
46+
=======================
47+
All parsers in this module implement ``multipart/form-data`` as defined by HTML5_
48+
and RFC7578_, supporting all modern browsers or HTTP clients in use today.
49+
Legacy browsers (e.g. IE6) are supported to some degree, but only if the
50+
required workarounds do not impact performance or security. In detail this means:
5651

5752
* Just ``multipart/form-data``, not suitable for email parsing.
5853
* No ``multipart/mixed`` support (deprecated in RFC7578_).
5954
* No ``base64`` or ``quoted-printable`` transfer encoding (deprecated in RFC7578_).
6055
* No ``encoded-word`` or ``name=_charset_`` encoding markers (deprecated in HTML5_).
6156
* No support for clearly broken clients (e.g. invalid line breaks or headers).
6257

63-
Usage and Examples
64-
==================
65-
66-
Here are some basic examples for the most common use cases. There are more
67-
parameters and features available than shown here, so check out the docstrings
68-
(or your IDEs built-in help) to get a full picture.
69-
70-
71-
Helper function for WSGI or CGI
72-
-------------------------------
73-
74-
For WSGI application developers we strongly suggest using the ``parse_form_data``
75-
helper function. It accepts a WSGI ``environ`` dictionary and parses both types
76-
of form submission (``multipart/form-data`` and ``application/x-www-form-urlencoded``)
77-
based on the actual content type of the request. You'll get two ``MultiDict``
78-
instances in return, one for text fields and the other for file uploads:
79-
80-
.. code-block:: python
81-
82-
from multipart import parse_form_data, is_form_request
83-
84-
def wsgi(environ, start_response):
85-
if is_form_request(environ):
86-
forms, files = parse_form_data(environ)
87-
88-
title = forms["title"] # type: string
89-
upload = files["upload"] # type: MultipartPart
90-
upload.save_as(...)
91-
92-
Note that form fields that are too large to fit into memory will end up as
93-
``MultipartPart`` instances in the ``files`` dict instead. This is to protect
94-
your app from running out of memory or crashing. ``MultipartPart`` instances are
95-
buffered to temporary files on disk if they exceed a certain size. The default
96-
limits should be fine for most use cases, but can be configured if you need to.
97-
See ``MultipartParser`` for details.
98-
99-
Flask, Bottle & Co
100-
^^^^^^^^^^^^^^^^^^
101-
102-
Most WSGI web frameworks already have multipart functionality built in, but
103-
you may still get better throughput for large files (or better limits control)
104-
by switching parsers:
105-
106-
.. code-block:: python
107-
108-
forms, files = multipart.parse_form_data(flask.request.environ)
109-
110-
Legacy CGI
111-
^^^^^^^^^^
112-
113-
If you are in the unfortunate position to have to rely on CGI, but can't use
114-
``cgi.FieldStorage`` anymore, it's possible to build a minimal WSGI environment
115-
from a CGI environment and use that with ``parse_form_data``. This is not a real
116-
WSGI environment, but it contains enough information for ``parse_form_data``
117-
to do its job. Do not forget to add proper error handling.
118-
119-
.. code-block:: python
120-
121-
import sys, os, multipart
122-
123-
environ = dict(os.environ.items())
124-
environ['wsgi.input'] = sys.stdin.buffer
125-
forms, files = multipart.parse_form_data(environ)
126-
127-
128-
Stream parser: ``MultipartParser``
129-
----------------------------------
130-
131-
The ``parse_form_data`` helper may be convenient, but it expects a WSGI
132-
environment and parses the entire request in one go before it returns any
133-
results. Using ``MultipartParser`` directly gives you more control and also
134-
allows you to process ``MultipartPart`` instances as soon as they arrive:
135-
136-
.. code-block:: python
137-
138-
from multipart import parse_options_header, MultipartParser
139-
140-
def wsgi(environ, start_response):
141-
content_type, params = parse_options_header(environ["CONTENT_TYPE"])
142-
143-
if content_type == "multipart/form-data":
144-
stream = environ["wsgi.input"]
145-
boundary = params["boundary"]
146-
charset = params.get("charset", "utf8")
147-
148-
parser = MultipartParser(stream, boundary, charset)
149-
for part in parser:
150-
if part.filename:
151-
print(f"{part.name}: File upload ({part.size} bytes)")
152-
part.save_as(...)
153-
elif part.size < 1024:
154-
print(f"{part.name}: Text field ({part.value!r})")
155-
else:
156-
print(f"{part.name}: Test field, but too big to print :/")
157-
158-
159-
Non-blocking parser: ``PushMultipartParser``
160-
--------------------------------------------
161-
162-
The ``MultipartParser`` handles IO and file buffering for you, but relies on
163-
blocking APIs. If you need absolute control over the parsing process and want to
164-
avoid blocking IO at all cost, then have a look at ``PushMultipartParser``, the
165-
low-level non-blocking incremental ``multipart/form-data`` parser that powers
166-
all the other parsers in this library:
167-
168-
.. code-block:: python
169-
170-
from multipart import PushMultipartParser, MultipartSegment
171-
172-
async def process_multipart(reader: asyncio.StreamReader, boundary: str):
173-
with PushMultipartParser(boundary) as parser:
174-
while not parser.closed:
58+
Installation
59+
============
17560

176-
chunk = await reader.read(1024*64)
177-
for result in parser.parse(chunk):
61+
``pip install multipart``
17862

179-
if isinstance(result, MultipartSegment):
180-
print(f"== Start of segment: {result.name}")
181-
if result.filename:
182-
print(f"== Client-side filename: {result.filename}")
183-
for header, value in result.headerlist:
184-
print(f"{header}: {value}")
185-
elif result: # Result is a non-empty bytearray
186-
print(f"[received {len(result)} bytes of data]")
187-
else: # Result is None
188-
print(f"== End of segment")
63+
Documentation
64+
=============
18965

66+
Examples and API documentation can be found at: https://multipart.readthedocs.io/
19067

19168
License
19269
=======

docs/api.rst

+9
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ API Reference
44

55
.. py:currentmodule:: multipart
66
7+
.. automodule:: multipart
8+
79
SansIO Parser
810
=============
911

@@ -12,12 +14,16 @@ SansIO Parser
1214

1315
.. autoclass:: MultipartSegment
1416
:members:
17+
:special-members: __getitem__
1518

1619
Stream Parser
1720
=============
1821

22+
1923
.. autoclass:: MultipartParser
2024
:members:
25+
:special-members: __iter__, __getitem__
26+
2127

2228
.. autoclass:: MultipartPart
2329
:members:
@@ -28,6 +34,9 @@ WSGI Helper
2834
.. autofunction:: is_form_request
2935
.. autofunction:: parse_form_data
3036

37+
.. autoclass:: MultiDict
38+
:members:
39+
3140
Header utils
3241
============
3342

docs/index.rst

+70-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,77 @@
11
.. py:currentmodule:: multipart
2-
.. include:: ../README.rst
2+
3+
=================================
4+
Python multipart/form-data parser
5+
=================================
6+
7+
.. image:: https://github.com/defnull/multipart/actions/workflows/test.yaml/badge.svg
8+
:target: https://github.com/defnull/multipart/actions/workflows/test.yaml
9+
:alt: Tests Status
10+
11+
.. image:: https://img.shields.io/pypi/v/multipart.svg
12+
:target: https://pypi.python.org/pypi/multipart/
13+
:alt: Latest Version
14+
15+
.. image:: https://img.shields.io/pypi/l/multipart.svg
16+
:target: https://pypi.python.org/pypi/multipart/
17+
:alt: License
18+
19+
.. _HTML5: https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data
20+
.. _RFC7578: https://www.rfc-editor.org/rfc/rfc7578
21+
.. _WSGI: https://peps.python.org/pep-3333
22+
.. _ASGI: https://asgi.readthedocs.io/en/latest/
23+
.. _SansIO: https://sans-io.readthedocs.io/
24+
.. _asyncio: https://docs.python.org/3/library/asyncio.html
25+
26+
This module provides a fast incremental non-blocking parser for
27+
``multipart/form-data`` [HTML5_, RFC7578_], as well as blocking alternatives for
28+
easier use in WSGI_ or CGI applications:
29+
30+
* :ref:`push-example`: Fast SansIO_ (incremental, non-blocking) parser suitable
31+
for ASGI_, asyncio_ and other IO, time or memory constrained environments.
32+
* :ref:`stream-example`: Blocking parser that reads from a stream and yields
33+
memory- or disk-buffered :class:`MultipartPart` instances.
34+
* :ref:`wsgi-example`: High-level functions and containers for WSGI_ or CGI
35+
applications with support for both `multipart` and `urlencoded` form submissions.
36+
37+
Features and Scope
38+
==================
39+
40+
* Pure python single file module with no dependencies.
41+
* Optimized for both blocking and non-blocking applications.
42+
* 100% test coverage with test data from actual browsers and HTTP clients.
43+
* High throughput and low latency (see `benchmarks <https://github.com/defnull/multipart_bench>`_).
44+
* Predictable memory and disk resource consumption via fine grained limits.
45+
* Strict mode: Spent less time parsing malicious or broken inputs.
46+
47+
**Scope:** All parsers in this module implement ``multipart/form-data`` as defined by HTML5_
48+
and RFC7578_, supporting all modern browsers or HTTP clients in use today.
49+
Legacy browsers (e.g. IE6) are supported to some degree, but only if the
50+
required workarounds do not impact performance or security. In detail this means:
51+
52+
* Just ``multipart/form-data``, not suitable for email parsing.
53+
* No ``multipart/mixed`` support (deprecated in RFC7578_).
54+
* No ``base64`` or ``quoted-printable`` transfer encoding (deprecated in RFC7578_).
55+
* No ``encoded-word`` or ``name=_charset_`` encoding markers (deprecated in HTML5_).
56+
* No support for clearly broken clients (e.g. invalid line breaks or headers).
57+
58+
Installation
59+
============
60+
61+
``pip install multipart``
62+
63+
Table of Content
64+
================
365

466
.. toctree::
567
:maxdepth: 2
6-
:hidden:
768

869
Home <self>
70+
usage
971
api
10-
changelog
72+
changelog
73+
74+
License
75+
=======
76+
77+
.. include:: ../LICENSE

0 commit comments

Comments
 (0)