The "Optimized Tarball Extraction" confuses me a bit. It begins by illustrating how other package managers have to repeatedly copy the received, compressed data into larger and larger buffers (not mentioning anything about the buffer where the decompressed data goes), and then says that:
> Bun takes a different approach by buffering the entire tarball before decompressing.
But seems to sidestep _how_ it does this any differently than the "bad" snippet the section opened with (presumably it checks the Content-Length header when it's fetching the tarball or something, and can assume the size it gets from there is correct). All it says about this is:
> Once Bun has the complete tarball in memory it can read the last 4 bytes of the gzip format.
Then it explains how it can pre-allocate a buffer for the decompressed data, but we never saw how this buffer allocation happens in the "bad" example!
> These bytes are special since store the uncompressed size of the file! Instead of having to guess how large the uncompressed file will be, Bun can pre-allocate memory to eliminate buffer resizing entirely
Presumably the saving is in the slow package managers having to expand _both_ of the buffers involved, while bun preallocates at least one of them?
I think my actual issue is that the "most package managers do something like this" example code snippet at the start of [1] doesn't seem to quite make sense - or doesn't match what I guess would actually happen in the decompress-in-a-loop scenario?
As in, it appears to illustrate building up a buffer holding the compressed data that's being received (since the "// ... decompress from buffer ..." comment at the end suggests what we're receiving in `chunk` is compressed), but I guess the problem with the decompress-as-the-data-arrives approach in reality is having to re-allocate the buffer for the decompressed data?
> Bun takes a different approach by buffering the entire tarball before decompressing.
But seems to sidestep _how_ it does this any differently than the "bad" snippet the section opened with (presumably it checks the Content-Length header when it's fetching the tarball or something, and can assume the size it gets from there is correct). All it says about this is:
> Once Bun has the complete tarball in memory it can read the last 4 bytes of the gzip format.
Then it explains how it can pre-allocate a buffer for the decompressed data, but we never saw how this buffer allocation happens in the "bad" example!
> These bytes are special since store the uncompressed size of the file! Instead of having to guess how large the uncompressed file will be, Bun can pre-allocate memory to eliminate buffer resizing entirely
Presumably the saving is in the slow package managers having to expand _both_ of the buffers involved, while bun preallocates at least one of them?