Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessarily large memory footprint #10550

Open
1 task done
atzannes opened this issue Mar 14, 2025 · 1 comment
Open
1 task done

Unnecessarily large memory footprint #10550

atzannes opened this issue Mar 14, 2025 · 1 comment
Labels
bug reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR

Comments

@atzannes
Copy link

Describe the bug

I have been trying to debug a memory pressure bug which has lead me to aiohttp. I have loosely traced the issue to the fact that stream.py uses the pattern of appending byte chunks to a list, then returning b"".join(chunks). When I replace this pattern with using a buffer = io.BytesIO() that I append to and return buffer.getvalue(), the unnecessary memory pressure goes away.

To reproduce the issue, I am downloading a large file (~500MB in this example) from S3 using UPath (which uses aiohttp internally). Without my proposed change, my high-watermark memory footprint is about twice the size of the large file, and when I get rid of the reference to the data it remains at about the size of the file. With my proposed change of using io.BytesIO() and avoiding b"".join(chunks), the high-watermark memory footprint is about the size of the file and when I drop the reference to it, the memory footprint goes back down to almost nothing.

For determining the memory footprint, I am using psutil and looking at the rss.

I am happy to try to dig deeper if needed or if you think that replacing the joining of bytes by an io.BytesIO buffer is not desirable, but given that I have a simple repro and a simple fix, I thought I would report the issue to open up the discussion. I'm happy to open a RP with my proposed solution.

To Reproduce

import psutil
from upath import UPath

def mem(when: str = ""):
    print(f"Resident Memory{when}: {psutil.Process().memory_info().rss / 1024**3:.2f} GB")

def upath_download(path) -> bytes:
    up = UPath(path)
    with up.open('rb') as fp:
        return fp.read()


path = 's3://my-bucket/my-500mb-file.bin'  # ~500MB

mem(" Initially")
data = upath_download(path)
print("data size", len(data))
mem(" After download")
del data
mem(" After delete")

Below is the output with the version 3.11.13 of aiohttp

Resident Memory Initially: 0.02 GB
data size 518458228
Resident Memory After download: 1.12 GB
Resident Memory After delete: 0.64 GB

Expected behavior

When I modify aiohttp to use my proposed solution with an io.BytesIO() buffer, I see the memory footprint grow to the size of the file when I hold a reference to it, and then going back down to almost nothing when I release the reference to the data, which is what I would expect.

Resident Memory Initially: 0.02 GB
data size 518458228
Resident Memory After download: 0.53 GB
Resident Memory After delete: 0.05 GB

Logs/tracebacks

Already listed above

Python Version

$ python --version
Python 3.10.8

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.11.13
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache-2.0
Location: /home/atzannes/code/github/chunky/.venv/lib/python3.10/site-packages
Requires: aiohappyeyeballs, aiosignal, async-timeout, attrs, frozenlist, multidict, propcache, yarl
Required-by: aiobotocore, aiohttp-cors, s3fs

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/atzannes/code/github/chunky/.venv/lib/python3.10/site-packages
Requires: typing-extensions
Required-by: aiobotocore, aiohttp, yarl

propcache Version

$ python -m pip show propcache

yarl Version

$ python -m pip show yarl

OS

Unnecessarily large memory footprint

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@atzannes atzannes added the bug label Mar 14, 2025
@bdraco bdraco added the reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR label Mar 14, 2025
@bdraco
Copy link
Member

bdraco commented Mar 14, 2025

It's a bit unclear from your report how you are using aiohttp as your reproduction code doesn't include aiohttp.

Can you provide a working reproducer that uses aiohttp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

No branches or pull requests

2 participants