Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request stream contents are kept in memory when using fetch #4058

Open
tobinus opened this issue Feb 12, 2025 · 11 comments
Open

Request stream contents are kept in memory when using fetch #4058

tobinus opened this issue Feb 12, 2025 · 11 comments
Labels
bug Something isn't working

Comments

@tobinus
Copy link

tobinus commented Feb 12, 2025

Bug Description

When streaming a file upload, the Node process' memory usage increases until it consumes an amount equal to the size of the file we are trying to upload.

Reproducible By

Save the following code to a file, such as streamUpload.mjs:

import fs from "node:fs";

if (process.argv.length !== 3) {
    console.error(`Usage: node ${process.argv[1]} FILE`);
    process.exit(2);
}

console.log('My PID:', process.pid);

const fileStream = fs.createReadStream(process.argv[2]);

const request = new Request("http://localhost:8000", {
    method: "PUT",
    headers: {
        "Content-Type": "application/json",
    },
    body: fileStream,
    duplex: 'half',
});

setInterval(() => {
    const { heapUsed, heapTotal, external, arrayBuffers} = process.memoryUsage();
    console.log(`Heap used/total: ${heapUsed} / ${heapTotal}; Array buffers/external: ${arrayBuffers} / ${external}`);
}, 1000);

fetch(request)
.then(async response => {
    console.log(response);
    console.log(await response.text());

    await new Promise(resolve => process.nextTick(resolve));
    console.log('Waiting 15 seconds until exiting')
    await new Promise(resolve => setTimeout(resolve, 15000));

    process.exit(0);
}).catch(err => {
    console.error('Encountered error: ', err);
    process.exit(1);
});

Set up netcat as a mock HTTP server in a separate terminal session:

nc -Clq 0 8000

Then try running the Node script using a large file (should be > 100MB):

node streamUpload.mjs path/to/largeFile

You should see that the heap usage stays about the same, while the array buffer usage increases steadily as the upload progresses (as does the external memory use, of course).

If you wish to let the fetch call resolve, you need to write the following into the nc program when the upload is complete:

HTTP/1.1 200 OK

Hello from netcat!

Then hit Enter and CTRL-D.

Logs from running

This is a transcript of what the above script prints out when run as described.
The file I'm referring to is 329738114 bytes large.

Note that I've tried adding some flags for constraining memory consumption.

Result of running the above script
$ node --optimize_for_size --max_old_space_size=10 --gc_interval=100 streamUpload.mjs path/to/file
My PID: 4143
Heap used/total: 7539152 / 10129408; Array buffers/external: 17957015 / 21277630
Heap used/total: 7555888 / 10129408; Array buffers/external: 24051863 / 27372518
Heap used/total: 7137680 / 10129408; Array buffers/external: 26935447 / 30256102
Heap used/total: 7127184 / 10129408; Array buffers/external: 32899223 / 39365606
Heap used/total: 7182544 / 10129408; Array buffers/external: 39452823 / 42773478
Heap used/total: 7562576 / 10129408; Array buffers/external: 45744279 / 49064934
Heap used/total: 7447840 / 10129408; Array buffers/external: 51445911 / 54766566
Heap used/total: 7278840 / 10129408; Array buffers/external: 56754327 / 60074982
Heap used/total: 7637736 / 9080832; Array buffers/external: 62849175 / 66169830
Heap used/total: 7388424 / 10129408; Array buffers/external: 67698839 / 71019494
Heap used/total: 7647112 / 10129408; Array buffers/external: 73531543 / 76852198
Heap used/total: 7425872 / 10391552; Array buffers/external: 78708887 / 82029542
Heap used/total: 8010584 / 10391552; Array buffers/external: 87752855 / 91073510
Heap used/total: 7469160 / 10391552; Array buffers/external: 89915543 / 93236198
Heap used/total: 8038920 / 10391552; Array buffers/external: 98959511 / 102280166
Heap used/total: 8305880 / 10391552; Array buffers/external: 104726679 / 108047334
Heap used/total: 8054872 / 10653696; Array buffers/external: 109838487 / 113159142
Heap used/total: 7805592 / 10653696; Array buffers/external: 114950295 / 118270950
Heap used/total: 8037552 / 10653696; Array buffers/external: 120782999 / 124103654
Heap used/total: 7910616 / 9342976; Array buffers/external: 126550167 / 129870822
Heap used/total: 8152160 / 10391552; Array buffers/external: 132317335 / 135637990
Heap used/total: 7975840 / 10653696; Array buffers/external: 137756823 / 141077478
Heap used/total: 7750144 / 10653696; Array buffers/external: 142934167 / 146254822
Heap used/total: 7971624 / 10653696; Array buffers/external: 148635799 / 151956454
Heap used/total: 8531728 / 10653696; Array buffers/external: 157679767 / 161000422
Heap used/total: 8276056 / 10653696; Array buffers/external: 162791575 / 166112230
Heap used/total: 8479336 / 10653696; Array buffers/external: 168493207 / 171813862
Heap used/total: 8257296 / 10915840; Array buffers/external: 173605015 / 176925670
Heap used/total: 8471160 / 10915840; Array buffers/external: 179503255 / 182823910
Heap used/total: 8190648 / 10915840; Array buffers/external: 184352919 / 187673574
Heap used/total: 8398160 / 10915840; Array buffers/external: 190054551 / 193375206
Heap used/total: 7420120 / 10391552; Array buffers/external: 196280471 / 199601126
Heap used/total: 7281152 / 10653696; Array buffers/external: 201654423 / 204975078
Heap used/total: 7501824 / 10653696; Array buffers/external: 207421591 / 210742246
Heap used/total: 7248248 / 10653696; Array buffers/external: 212402327 / 215722982
Heap used/total: 7450432 / 10653696; Array buffers/external: 217907351 / 221228006
Heap used/total: 8007776 / 10653696; Array buffers/external: 226820247 / 230140902
Heap used/total: 7390304 / 10915840; Array buffers/external: 228458647 / 231779302
Heap used/total: 7903000 / 10915840; Array buffers/external: 237240471 / 240561126
Heap used/total: 8080640 / 10915840; Array buffers/external: 242811031 / 246131686
Heap used/total: 7790800 / 10915840; Array buffers/external: 247660695 / 250981350
Heap used/total: 8025560 / 10915840; Array buffers/external: 253296791 / 256617446
Heap used/total: 7234104 / 10129408; Array buffers/external: 258080919 / 261399687
Heap used/total: 7421152 / 10129408; Array buffers/external: 263651479 / 266970247
Heap used/total: 7125952 / 10129408; Array buffers/external: 268501143 / 271819911
Heap used/total: 7321008 / 10129408; Array buffers/external: 274137239 / 277456007
Heap used/total: 7798696 / 10129408; Array buffers/external: 282919063 / 286237831
Heap used/total: 7155728 / 10129408; Array buffers/external: 284229783 / 287548551
Heap used/total: 6978424 / 10129408; Array buffers/external: 290193559 / 296262828
Heap used/total: 7266184 / 10129408; Array buffers/external: 296091799 / 299408556
Heap used/total: 7819944 / 10129408; Array buffers/external: 305201303 / 308518060
Heap used/total: 7729584 / 10391552; Array buffers/external: 309461143 / 312777900
Heap used/total: 7442880 / 10653696; Array buffers/external: 314179735 / 317496492
Heap used/total: 7627160 / 10653696; Array buffers/external: 319815831 / 323132588
Heap used/total: 7320616 / 10653696; Array buffers/external: 324599959 / 327916716
Heap used/total: 7427328 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7431488 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7434560 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7438016 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7441088 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7444160 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7447232 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7450304 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7453376 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7456448 / 10915840; Array buffers/external: 329803792 / 335912538
Heap used/total: 7470616 / 10915840; Array buffers/external: 329803809 / 333133545
Response {
  status: 200,
  statusText: 'OK',
  headers: Headers {},
  body: ReadableStream { locked: false, state: 'readable', supportsBYOB: true },
  bodyUsed: false,
  ok: true,
  redirected: false,
  type: 'basic',
  url: 'http://localhost:8000/'
}
Heap used/total: 7628296 / 10915840; Array buffers/external: 329803811 / 333133547
Heap used/total: 7631432 / 10915840; Array buffers/external: 329803811 / 333133547
Heap used/total: 7634504 / 10915840; Array buffers/external: 329803811 / 333133547
Heap used/total: 7637576 / 10915840; Array buffers/external: 329803811 / 333133547
Hello from Netcat!

Waiting 15 seconds until exiting
Heap used/total: 7845016 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7847576 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7850072 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7852568 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7855064 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7857560 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7860056 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7862552 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7865048 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7867544 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7870040 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7872536 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7875032 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7877528 / 10915840; Array buffers/external: 329803851 / 333133587
Heap used/total: 7880024 / 10915840; Array buffers/external: 329803851 / 333133587

Expected Behavior

The memory usage should increase by the size of the buffer used between the file stream and the TCP writable stream, and not be affected by the size of the file. The memory usage should stay the same throughout, no matter how much has been streamed.

Environment

I'm using Windows Subsystem for Linux, with Ubuntu 22.04.5 LTS, though I've seen this behaviour on regular Linux installs as well.
Node is installed with nvm, version 22.13.1.

Additional context

I first observed this when following the instructions for how to stream a file upload (from #2202) using FormData. But I was able to reproduce it using plain streams when trying some workarounds.

As a consequence, we cannot use fetch to upload large files in environments where the files are too large to keep in memory.

EDIT: Changed to .mjs file suffix, and added logging of memory usage within the script itself. Added logs from running.

@tobinus tobinus added the bug Something isn't working label Feb 12, 2025
@mcollina
Copy link
Member

What you see in top is the RSS, meaning the amount of memory is actually using. However V8 heaps grows in chunks, and when sending a lot of data it tends to not collect if it has a lot of memory available.

In other terms, the memory you see in top is data that is not meaningful. What you need to check is the consumption of the heap.


@KhafraDev can you confirm that using Request directly is not reading the whole thing/cloning it, right?

@KhafraDev
Copy link
Member

can you confirm that using Request directly is not reading the whole thing

fetch does read the entire body, but it doesn't buffer it. Each chunk transmitted has to check if the request has been aborted.

https://github.com/nodejs/undici/blob/c7f3d77011234fe243c317ada1398044032342cc/lib/web/fetch/index.js#L1843-1855

https://fetch.spec.whatwg.org/#body-incrementally-read

@tobinus
Copy link
Author

tobinus commented Feb 13, 2025

What you see in top is the RSS, meaning the amount of memory is actually using. However V8 heaps grows in chunks, and when sending a lot of data it tends to not collect if it has a lot of memory available.

In other terms, the memory you see in top is data that is not meaningful. What you need to check is the consumption of the heap.

Ok, thank you @mcollina! I've modified the script in the original post to log its memory usage, so that it is clearer where the memory use is coming from.

What I see is that the heap usage stays at around 7-8 MB. The increase in memory I observe is coming from the use of array buffers/external memory, which increases steadily and never goes down, until it is a tiny bit larger than the size of the file I'm streaming. Is this expected behaviour for NodeJS?

I am not so familiar with how NodeJS' garbage collection works. Are array buffers not collected in the same way as the heap? I was able to find lots of resources on how to manage the heap size, but not so much on array buffers, so I'd be grateful for some, err, pointers.

@mcollina
Copy link
Member

Can you include your reproduction?

@tobinus
Copy link
Author

tobinus commented Feb 14, 2025

Can you include your reproduction?

The reproduction is in the original post, as noted. Or is something missing? 🙂

@mcollina
Copy link
Member

Sorry, I've found it.

I'm a little bit puzzled by this bug, because I don't know what is holding on that data, but I'm convinced a bug is there: those data should be cleaned up.

@mcollina
Copy link
Member

My understanding is that this behavior is spec compliant.

Specifically, the request is cloned to follow redirects, and the only way to to do that for a request with a body is to keep the whole data in memory, waiting for that redirect.

Therefore, setting the required parameters to Request fixes it:

new Request("http://localhost:8000", {
  method: "PUT",
  headers: {
    "Content-Type": "application/json",
  },
  body: fileStream,
  duplex: 'half',
  window: null,
  redirect: 'error'
});

I would recommend using undici.request() for big file transfers, which would be significantly faster (likely by an order of magnitude, but I didn't test).

The following works as expected:

import fs from "node:fs";
import { fetch, Request } from "./index.js";

if (process.argv.length !== 3) {
  console.error(`Usage: node ${process.argv[1]} FILE`);
  process.exit(2);
}

console.log('My PID:', process.pid);

const fileStream = fs.createReadStream(process.argv[2]);

const request = new Request("http://localhost:8000", {
  method: "PUT",
  headers: {
    "Content-Type": "application/json",
  },
  body: fileStream,
  duplex: 'half',
  window: null,
  redirect: 'error'
});

setInterval(() => {
  const { heapUsed, heapTotal, external, arrayBuffers} = process.memoryUsage();
  console.log(`Heap used/total: ${heapUsed} / ${heapTotal}; Array buffers/external: ${arrayBuffers} / ${external}`);
  global.gc({ type: 'major' });
}, 1000);

const response = await fetch(request)
console.log(response);
console.log(await response.text());

global.gc({ type: 'major' });

await new Promise(resolve => process.nextTick(resolve));
console.log('Waiting 15 seconds until exiting')
await new Promise(resolve => setTimeout(resolve, 15000));

process.exit(0);

@KhafraDev to "prevent" this footgun, do you think we could "resume" the original body once we received the headers of the response, and be sure that it's not a redirect. Do you think this would feasible or spec compliant?

@mcollina
Copy link
Member

it seems following redirects is supported by requests with a body... PUT, POST, etc. Is this spec compliant?

@KhafraDev
Copy link
Member

KhafraDev commented Feb 16, 2025

Do you think this would feasible or spec compliant?

I think it's both feasible and spec compliant. The spec doesn't mention when the body should be available nor should it be noticeable to users.

it seems following redirects is supported by requests with a body... PUT, POST, etc. Is this spec compliant?

Yes, it seems to be supported in some capacity. I'd need to test some things to know for certain. (https://fetch.spec.whatwg.org/#http-redirect-fetch):

If one of the following is true

  • internalResponse’s status is 301 or 302 and request’s method is POST
  • internalResponse’s status is 303 and request’s method is not GET or HEAD
    then:
    Set request’s method to GET and request’s body to null.

@tobinus
Copy link
Author

tobinus commented Feb 17, 2025

Thank you for the research and the workaround, @mcollina! I can confirm that the memory usage is as expected when I set window: null and redirect: "error" 😊

@tobinus
Copy link
Author

tobinus commented Feb 18, 2025

What does the spec say about following redirects with a body?

Regarding the question:

it seems following redirects is supported by requests with a body... PUT, POST, etc. Is this spec compliant?

I don't have a deep understanding of the Fetch spec, but I wonder if maybe such requests are supported… but only if you are able to create a new stream from the body, which is the case for Blob, byte sequence, BufferSource, FormData, URLSearchParams, or strings. If only a ReadableStream is given to fetch, then it is supposed to throw NetworkError if a redirect occurs (except 303, where the body is not needed since the redirect is made into a GET request).

Specifically, the Fetch spec for the part where the HTTP request is cloned, has a note saying that:

Implementations are encouraged to avoid teeing request’s body’s stream when request’s body’s source is null as only a single body is needed in that case. E.g., when request’s body’s source is null, redirects and authentication will end up failing the fetch.

The request's body's source is only set for the types I mentioned above, in the procedure for extracting a body from BodyInit. It is initialised in step 7, maybe assigned a value in step 10 and included in the request's body in step 13. Note that it is never assigned a value for ReadableStream, so it stays null.

I was initially puzzled by the statement that only one body is needed when the request's body's source is null. But it seems to hold true when I find the parts where a new request may be made:

  • In HTTP-network-or-cache-fetch, when authentication is required by the server in step 14, a network error is returned in step 14.2.1 instead of preparing a new request with credentials given by the user.
  • In HTTP-network-or-cache-fetch, when the response's status is 421 in step 16, the new request is only made if the body's source is non-null. If it is null, the 421 response is simply returned in step 18.
  • In HTTP-redirect-fetch, a network error is to be returned in step 11 instead of following the redirect if the redirect is anything but 303. If it is 303, the body is set to null in step 12.1 anyway.

However, it does seem like the original stream would be re-used in the following case:

But I may be missing something, I'm just taking a cursory glance at the spec. The above lists are probably not exhaustive?

Aside: Maybe the spec could be improved?

It seems weird to me that the spec would make you tee the stream in the body clone procedure, and simultaneously note that you don't need to do that if the body's source is null. If the body's source is set to anything other than null, then surely you wouldn't need to tee the stream – you could simply extract a new body from the source whenever you need to make a new request? This is already done in HTTP-network-or-cache-fetch step 14.2.2 and HTTP-redirect-fetch step 14. Funnily enough, it is not done in HTTP-network-or-cache-fetch step 16.2, so it ends up using the tee'd stream, but only if the body's source is non-null 🤔

It would be much safer, too, since it would enforce checking that the body's source is non-null wherever you'd like to make new request under the hood, instead of this connection between body's source's non-nullity and the need for teeing the stream being a weak and informal one.

It also doesn't make sense from a memory usage point of view to tee the stream when the body's source is non-null. It would mean that a user uploading a file as a part of a form would have to store the entire file in memory.

I guess I should read up on the fetch spec's issue tracker to see if anything similar has come up, and open a new issue there if not. Unless anyone with a better understanding of the Fetch spec wants to do that?

Potential paths forward

I think it would be interesting to know whether undici's fetch implementation returns a network error in the cases mentioned above. In that case, the advisory note that mentioned avoiding teeing when the body's source is null could be adhered to without causing any trouble for existing users of fetch.

The above solution would not solve the memory issue with big files being submitted as a part of a FormData, or when a Blob is used as body directly. Maybe the request's body's stream could be set up to be lazy? For bodies with a source, starting to use the stream would run the procedure for extracting a body with a stream from BodyInit. Cloning such a request would create a new body with a lazy stream, ready to extract a new stream from BodyInit if the stream is ever used.

For bodies without a source, the solution would have to be more involved, since the body's stream can only be consumed once. Perhaps a network error could be thrown if it is read twice across all clones? It would match the advisory note's assertion about the body only being consumed once if the source is null.

I hope this is of some help 🙂 I don't have much of any stake in this, so if I'm wrong in a way that is hard to put into words then don't worry about it, I'm perfectly happy with leaving this to the experts 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants