Unnecessarily large memory footprint #10550

atzannes · 2025-03-14T21:46:16Z

Describe the bug

I have been trying to debug a memory pressure bug which has lead me to aiohttp. I have loosely traced the issue to the fact that stream.py uses the pattern of appending byte chunks to a list, then returning b"".join(chunks). When I replace this pattern with using a buffer = io.BytesIO() that I append to and return buffer.getvalue(), the unnecessary memory pressure goes away.

To reproduce the issue, I am downloading a large file (~500MB in this example) from S3 using UPath (which uses aiohttp internally). Without my proposed change, my high-watermark memory footprint is about twice the size of the large file, and when I get rid of the reference to the data it remains at about the size of the file. With my proposed change of using io.BytesIO() and avoiding b"".join(chunks), the high-watermark memory footprint is about the size of the file and when I drop the reference to it, the memory footprint goes back down to almost nothing.

For determining the memory footprint, I am using psutil and looking at the rss.

I am happy to try to dig deeper if needed or if you think that replacing the joining of bytes by an io.BytesIO buffer is not desirable, but given that I have a simple repro and a simple fix, I thought I would report the issue to open up the discussion. I'm happy to open a RP with my proposed solution.

To Reproduce

import psutil
from upath import UPath

def mem(when: str = ""):
    print(f"Resident Memory{when}: {psutil.Process().memory_info().rss / 1024**3:.2f} GB")

def upath_download(path) -> bytes:
    up = UPath(path)
    with up.open('rb') as fp:
        return fp.read()


path = 's3://my-bucket/my-500mb-file.bin'  # ~500MB

mem(" Initially")
data = upath_download(path)
print("data size", len(data))
mem(" After download")
del data
mem(" After delete")

Below is the output with the version 3.11.13 of aiohttp

Resident Memory Initially: 0.02 GB
data size 518458228
Resident Memory After download: 1.12 GB
Resident Memory After delete: 0.64 GB

Expected behavior

When I modify aiohttp to use my proposed solution with an io.BytesIO() buffer, I see the memory footprint grow to the size of the file when I hold a reference to it, and then going back down to almost nothing when I release the reference to the data, which is what I would expect.

Resident Memory Initially: 0.02 GB
data size 518458228
Resident Memory After download: 0.53 GB
Resident Memory After delete: 0.05 GB

Logs/tracebacks

Already listed above

Python Version

$ python --version
Python 3.10.8

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.11.13
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache-2.0
Location: /home/atzannes/code/github/chunky/.venv/lib/python3.10/site-packages
Requires: aiohappyeyeballs, aiosignal, async-timeout, attrs, frozenlist, multidict, propcache, yarl
Required-by: aiobotocore, aiohttp-cors, s3fs

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/atzannes/code/github/chunky/.venv/lib/python3.10/site-packages
Requires: typing-extensions
Required-by: aiobotocore, aiohttp, yarl

propcache Version

$ python -m pip show propcache

yarl Version

$ python -m pip show yarl

OS

Unnecessarily large memory footprint

Related component

Client

Additional context

No response

Code of Conduct

I agree to follow the aio-libs Code of Conduct

The text was updated successfully, but these errors were encountered:

bdraco · 2025-03-14T23:04:30Z

It's a bit unclear from your report how you are using aiohttp as your reproduction code doesn't include aiohttp.

Can you provide a working reproducer that uses aiohttp?

atzannes added the bug label Mar 14, 2025

bdraco added the reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR label Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessarily large memory footprint #10550

Unnecessarily large memory footprint #10550

atzannes commented Mar 14, 2025

bdraco commented Mar 14, 2025

Unnecessarily large memory footprint #10550

Unnecessarily large memory footprint #10550

Comments

atzannes commented Mar 14, 2025

Describe the bug

To Reproduce

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

propcache Version

yarl Version

OS

Related component

Additional context

Code of Conduct

bdraco commented Mar 14, 2025