@@ -21,172 +21,49 @@ Python multipart/form-data parser
21
21
.. _SansIO : https://sans-io.readthedocs.io/
22
22
.. _asyncio : https://docs.python.org/3/library/asyncio.html
23
23
24
- This module provides a fast incremental non-blocking parser for RFC7578 _
25
- ``multipart/form-data ``, as well as blocking alternatives for easier use in
26
- WSGI _ or CGI applications:
27
-
28
- * ``PushMultipartParser ``: Incremental and non-blocking (SansIO _) parser
29
- suitable for ASGI _, asyncio _ and other time or memory constrained environments.
30
- * ``MultipartParser ``: Streaming parser that yields memory- or disk-buffered
31
- ``MultipartPart `` instances.
32
- * ``parse_form_data(environ) `` and ``is_form_request(environ) ``: Convenience
33
- functions for WSGI _ applications with support for both ``multipart/form-data ``
34
- and ``application/x-www-form-urlencoded `` form submissions.
35
-
36
-
37
- Installation
38
- ============
39
-
40
- ``pip install multipart ``
24
+ This module provides a fast incremental non-blocking parser for
25
+ ``multipart/form-data `` [HTML5 _, RFC7578 _], as well as blocking alternatives for
26
+ easier use in WSGI _ or CGI applications:
41
27
28
+ * **PushMultipartParser **: Fast SansIO _ (incremental, non-blocking) parser suitable
29
+ for ASGI _, asyncio _ and other IO, time or memory constrained environments.
30
+ * **MultipartParser **: Streaming parser that reads from a byte stream and yields
31
+ memory- or disk-buffered `MultipartPart ` instances.
32
+ * **WSGI Helper **: High-level functions and containers for WSGI _ or CGI applications with support
33
+ for both `multipart ` and `urlencoded ` form submissions.
42
34
43
35
Features
44
36
========
45
37
46
38
* Pure python single file module with no dependencies.
47
- * Well tested with inputs from actual browsers and HTTP clients. 100% test coverage.
48
- * Parses multiple GB/s on modern hardware (see `benchmarks <https://github.com/defnull/multipart_bench >`_).
49
- * Quickly rejects malicious or broken inputs and emits useful error messages.
50
- * Enforces configurable memory and disk resource limits to prevent DoS attacks.
51
-
52
- **Scope: ** This parser implements ``multipart/form-data `` as defined by HTML5 _
53
- and RFC7578 _ and aims to support all browsers or HTTP clients in use today.
54
- Legacy browsers are supported to some degree, but only if those workarounds do
55
- not impact performance or security. In detail this means:
39
+ * Optimized for both blocking and non-blocking applications.
40
+ * 100% test coverage with test data from actual browsers and HTTP clients.
41
+ * High throughput and low latency (see `benchmarks <https://github.com/defnull/multipart_bench >`_).
42
+ * Predictable memory and disk resource consumption via fine grained limits.
43
+ * Strict mode: Spent less time parsing malicious or broken inputs.
44
+
45
+ Scope and compatibility
46
+ =======================
47
+ All parsers in this module implement ``multipart/form-data `` as defined by HTML5 _
48
+ and RFC7578 _, supporting all modern browsers or HTTP clients in use today.
49
+ Legacy browsers (e.g. IE6) are supported to some degree, but only if the
50
+ required workarounds do not impact performance or security. In detail this means:
56
51
57
52
* Just ``multipart/form-data ``, not suitable for email parsing.
58
53
* No ``multipart/mixed `` support (deprecated in RFC7578 _).
59
54
* No ``base64 `` or ``quoted-printable `` transfer encoding (deprecated in RFC7578 _).
60
55
* No ``encoded-word `` or ``name=_charset_ `` encoding markers (deprecated in HTML5 _).
61
56
* No support for clearly broken clients (e.g. invalid line breaks or headers).
62
57
63
- Usage and Examples
64
- ==================
65
-
66
- Here are some basic examples for the most common use cases. There are more
67
- parameters and features available than shown here, so check out the docstrings
68
- (or your IDEs built-in help) to get a full picture.
69
-
70
-
71
- Helper function for WSGI or CGI
72
- -------------------------------
73
-
74
- For WSGI application developers we strongly suggest using the ``parse_form_data ``
75
- helper function. It accepts a WSGI ``environ `` dictionary and parses both types
76
- of form submission (``multipart/form-data `` and ``application/x-www-form-urlencoded ``)
77
- based on the actual content type of the request. You'll get two ``MultiDict ``
78
- instances in return, one for text fields and the other for file uploads:
79
-
80
- .. code-block :: python
81
-
82
- from multipart import parse_form_data, is_form_request
83
-
84
- def wsgi (environ , start_response ):
85
- if is_form_request(environ):
86
- forms, files = parse_form_data(environ)
87
-
88
- title = forms[" title" ] # type: string
89
- upload = files[" upload" ] # type: MultipartPart
90
- upload.save_as(... )
91
-
92
- Note that form fields that are too large to fit into memory will end up as
93
- ``MultipartPart `` instances in the ``files `` dict instead. This is to protect
94
- your app from running out of memory or crashing. ``MultipartPart `` instances are
95
- buffered to temporary files on disk if they exceed a certain size. The default
96
- limits should be fine for most use cases, but can be configured if you need to.
97
- See ``MultipartParser `` for details.
98
-
99
- Flask, Bottle & Co
100
- ^^^^^^^^^^^^^^^^^^
101
-
102
- Most WSGI web frameworks already have multipart functionality built in, but
103
- you may still get better throughput for large files (or better limits control)
104
- by switching parsers:
105
-
106
- .. code-block :: python
107
-
108
- forms, files = multipart.parse_form_data(flask.request.environ)
109
-
110
- Legacy CGI
111
- ^^^^^^^^^^
112
-
113
- If you are in the unfortunate position to have to rely on CGI, but can't use
114
- ``cgi.FieldStorage `` anymore, it's possible to build a minimal WSGI environment
115
- from a CGI environment and use that with ``parse_form_data ``. This is not a real
116
- WSGI environment, but it contains enough information for ``parse_form_data ``
117
- to do its job. Do not forget to add proper error handling.
118
-
119
- .. code-block :: python
120
-
121
- import sys, os, multipart
122
-
123
- environ = dict (os.environ.items())
124
- environ[' wsgi.input' ] = sys.stdin.buffer
125
- forms, files = multipart.parse_form_data(environ)
126
-
127
-
128
- Stream parser: ``MultipartParser ``
129
- ----------------------------------
130
-
131
- The ``parse_form_data `` helper may be convenient, but it expects a WSGI
132
- environment and parses the entire request in one go before it returns any
133
- results. Using ``MultipartParser `` directly gives you more control and also
134
- allows you to process ``MultipartPart `` instances as soon as they arrive:
135
-
136
- .. code-block :: python
137
-
138
- from multipart import parse_options_header, MultipartParser
139
-
140
- def wsgi (environ , start_response ):
141
- content_type, params = parse_options_header(environ[" CONTENT_TYPE" ])
142
-
143
- if content_type == " multipart/form-data" :
144
- stream = environ[" wsgi.input" ]
145
- boundary = params[" boundary" ]
146
- charset = params.get(" charset" , " utf8" )
147
-
148
- parser = MultipartParser(stream, boundary, charset)
149
- for part in parser:
150
- if part.filename:
151
- print (f " { part.name} : File upload ( { part.size} bytes) " )
152
- part.save_as(... )
153
- elif part.size < 1024 :
154
- print (f " { part.name} : Text field ( { part.value!r } ) " )
155
- else :
156
- print (f " { part.name} : Test field, but too big to print :/ " )
157
-
158
-
159
- Non-blocking parser: ``PushMultipartParser ``
160
- --------------------------------------------
161
-
162
- The ``MultipartParser `` handles IO and file buffering for you, but relies on
163
- blocking APIs. If you need absolute control over the parsing process and want to
164
- avoid blocking IO at all cost, then have a look at ``PushMultipartParser ``, the
165
- low-level non-blocking incremental ``multipart/form-data `` parser that powers
166
- all the other parsers in this library:
167
-
168
- .. code-block :: python
169
-
170
- from multipart import PushMultipartParser, MultipartSegment
171
-
172
- async def process_multipart (reader : asyncio.StreamReader, boundary : str ):
173
- with PushMultipartParser(boundary) as parser:
174
- while not parser.closed:
58
+ Installation
59
+ ============
175
60
176
- chunk = await reader.read(1024 * 64 )
177
- for result in parser.parse(chunk):
61
+ ``pip install multipart ``
178
62
179
- if isinstance (result, MultipartSegment):
180
- print (f " == Start of segment: { result.name} " )
181
- if result.filename:
182
- print (f " == Client-side filename: { result.filename} " )
183
- for header, value in result.headerlist:
184
- print (f " { header} : { value} " )
185
- elif result: # Result is a non-empty bytearray
186
- print (f " [received { len (result)} bytes of data] " )
187
- else : # Result is None
188
- print (f " == End of segment " )
63
+ Documentation
64
+ =============
189
65
66
+ Examples and API documentation can be found at: https://multipart.readthedocs.io/
190
67
191
68
License
192
69
=======
0 commit comments