-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request stream contents are kept in memory when using fetch #4058
Comments
What you see in top is the RSS, meaning the amount of memory is actually using. However V8 heaps grows in chunks, and when sending a lot of data it tends to not collect if it has a lot of memory available. In other terms, the memory you see in top is data that is not meaningful. What you need to check is the consumption of the heap. @KhafraDev can you confirm that using Request directly is not reading the whole thing/cloning it, right? |
fetch does read the entire body, but it doesn't buffer it. Each chunk transmitted has to check if the request has been aborted. |
Ok, thank you @mcollina! I've modified the script in the original post to log its memory usage, so that it is clearer where the memory use is coming from. What I see is that the heap usage stays at around 7-8 MB. The increase in memory I observe is coming from the use of array buffers/external memory, which increases steadily and never goes down, until it is a tiny bit larger than the size of the file I'm streaming. Is this expected behaviour for NodeJS? I am not so familiar with how NodeJS' garbage collection works. Are array buffers not collected in the same way as the heap? I was able to find lots of resources on how to manage the heap size, but not so much on array buffers, so I'd be grateful for some, err, pointers. |
Can you include your reproduction? |
The reproduction is in the original post, as noted. Or is something missing? 🙂 |
Sorry, I've found it. I'm a little bit puzzled by this bug, because I don't know what is holding on that data, but I'm convinced a bug is there: those data should be cleaned up. |
My understanding is that this behavior is spec compliant. Specifically, the request is cloned to follow redirects, and the only way to to do that for a request with a body is to keep the whole data in memory, waiting for that redirect. Therefore, setting the required parameters to new Request("http://localhost:8000", {
method: "PUT",
headers: {
"Content-Type": "application/json",
},
body: fileStream,
duplex: 'half',
window: null,
redirect: 'error'
}); I would recommend using The following works as expected:
@KhafraDev to "prevent" this footgun, do you think we could "resume" the original body once we received the headers of the response, and be sure that it's not a redirect. Do you think this would feasible or spec compliant? |
it seems following redirects is supported by requests with a body... PUT, POST, etc. Is this spec compliant? |
I think it's both feasible and spec compliant. The spec doesn't mention when the body should be available nor should it be noticeable to users.
Yes, it seems to be supported in some capacity. I'd need to test some things to know for certain. (https://fetch.spec.whatwg.org/#http-redirect-fetch):
|
Thank you for the research and the workaround, @mcollina! I can confirm that the memory usage is as expected when I set |
What does the spec say about following redirects with a body?Regarding the question:
I don't have a deep understanding of the Fetch spec, but I wonder if maybe such requests are supported… but only if you are able to create a new stream from the body, which is the case for Specifically, the Fetch spec for the part where the HTTP request is cloned, has a note saying that:
The request's body's source is only set for the types I mentioned above, in the procedure for extracting a body from I was initially puzzled by the statement that only one body is needed when the request's body's source is null. But it seems to hold true when I find the parts where a new request may be made:
However, it does seem like the original stream would be re-used in the following case:
But I may be missing something, I'm just taking a cursory glance at the spec. The above lists are probably not exhaustive? Aside: Maybe the spec could be improved?It seems weird to me that the spec would make you tee the stream in the body clone procedure, and simultaneously note that you don't need to do that if the body's source is null. If the body's source is set to anything other than null, then surely you wouldn't need to tee the stream – you could simply extract a new body from the source whenever you need to make a new request? This is already done in HTTP-network-or-cache-fetch step 14.2.2 and HTTP-redirect-fetch step 14. Funnily enough, it is not done in HTTP-network-or-cache-fetch step 16.2, so it ends up using the tee'd stream, but only if the body's source is non-null 🤔 It would be much safer, too, since it would enforce checking that the body's source is non-null wherever you'd like to make new request under the hood, instead of this connection between body's source's non-nullity and the need for teeing the stream being a weak and informal one. It also doesn't make sense from a memory usage point of view to tee the stream when the body's source is non-null. It would mean that a user uploading a file as a part of a form would have to store the entire file in memory. I guess I should read up on the fetch spec's issue tracker to see if anything similar has come up, and open a new issue there if not. Unless anyone with a better understanding of the Fetch spec wants to do that? Potential paths forwardI think it would be interesting to know whether undici's The above solution would not solve the memory issue with big files being submitted as a part of a For bodies without a source, the solution would have to be more involved, since the body's stream can only be consumed once. Perhaps a network error could be thrown if it is read twice across all clones? It would match the advisory note's assertion about the body only being consumed once if the source is null. I hope this is of some help 🙂 I don't have much of any stake in this, so if I'm wrong in a way that is hard to put into words then don't worry about it, I'm perfectly happy with leaving this to the experts 😅 |
Bug Description
When streaming a file upload, the Node process' memory usage increases until it consumes an amount equal to the size of the file we are trying to upload.
Reproducible By
Save the following code to a file, such as
streamUpload.mjs
:Set up netcat as a mock HTTP server in a separate terminal session:
Then try running the Node script using a large file (should be > 100MB):
You should see that the heap usage stays about the same, while the array buffer usage increases steadily as the upload progresses (as does the external memory use, of course).
If you wish to let the fetch call resolve, you need to write the following into the
nc
program when the upload is complete:Then hit Enter and CTRL-D.
Logs from running
This is a transcript of what the above script prints out when run as described.
The file I'm referring to is
329738114
bytes large.Note that I've tried adding some flags for constraining memory consumption.
Result of running the above script
Expected Behavior
The memory usage should increase by the size of the buffer used between the file stream and the TCP writable stream, and not be affected by the size of the file. The memory usage should stay the same throughout, no matter how much has been streamed.
Environment
I'm using Windows Subsystem for Linux, with Ubuntu 22.04.5 LTS, though I've seen this behaviour on regular Linux installs as well.
Node is installed with nvm, version 22.13.1.
Additional context
I first observed this when following the instructions for how to stream a file upload (from #2202) using
FormData
. But I was able to reproduce it using plain streams when trying some workarounds.As a consequence, we cannot use
fetch
to upload large files in environments where the files are too large to keep in memory.EDIT: Changed to .mjs file suffix, and added logging of memory usage within the script itself. Added logs from running.
The text was updated successfully, but these errors were encountered: