Unnecessarily large memory footprint #10550
Labels
bug
reproducer: missing
This PR or issue lacks code, which reproduce the problem described or clearly understandable STR
Describe the bug
I have been trying to debug a memory pressure bug which has lead me to
aiohttp
. I have loosely traced the issue to the fact thatstream.py
uses the pattern of appending byte chunks to a list, then returningb"".join(chunks)
. When I replace this pattern with using abuffer = io.BytesIO()
that I append to and returnbuffer.getvalue()
, the unnecessary memory pressure goes away.To reproduce the issue, I am downloading a large file (~500MB in this example) from S3 using UPath (which uses
aiohttp
internally). Without my proposed change, my high-watermark memory footprint is about twice the size of the large file, and when I get rid of the reference to the data it remains at about the size of the file. With my proposed change of usingio.BytesIO()
and avoidingb"".join(chunks)
, the high-watermark memory footprint is about the size of the file and when I drop the reference to it, the memory footprint goes back down to almost nothing.For determining the memory footprint, I am using
psutil
and looking at the rss.I am happy to try to dig deeper if needed or if you think that replacing the joining of bytes by an
io.BytesIO
buffer is not desirable, but given that I have a simple repro and a simple fix, I thought I would report the issue to open up the discussion. I'm happy to open a RP with my proposed solution.To Reproduce
Below is the output with the version 3.11.13 of aiohttp
Expected behavior
When I modify
aiohttp
to use my proposed solution with anio.BytesIO()
buffer, I see the memory footprint grow to the size of the file when I hold a reference to it, and then going back down to almost nothing when I release the reference to the data, which is what I would expect.Logs/tracebacks
Python Version
aiohttp Version
multidict Version
propcache Version
$ python -m pip show propcache
yarl Version
$ python -m pip show yarl
OS
Unnecessarily large memory footprint
Related component
Client
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: