zopfli is the wrong thing to use here. If you want an example of where these thi...

Someone · 2025-01-28T06:37:56 1738046276

> the standard compression level for rpms on redhat distros is zstd level 19

> The only reason to use zstd is because you want fairly good compression, but fast

I would think having fast decompression is desirable, too, especially for rpms on redhat distros, which get decompressed a lot more often than they get compressed, and where the CPUs doing decompression may be a lot slower than the CPUs doing the compression.

And zstd beats xz in decompression times.

canucker2016 · 2025-01-28T09:14:00 1738055640

Here's the Fedora page relating to changing RPM from xz level 2 to zstd level 19.

https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_c...

Just two tables for comparison, first one shows only the decompression for Firefox RPM, second shows compression time, compressed size, and decompression for a large RPM.

You'd think there'd be more data.

DannyBee · 2025-01-28T12:39:30 1738067970

Let me start by reiterating - zstd is a great option. I think zstd level 5-10 would have been an awesome choice. I love zstd - it is a great algorithm that really hits the sweet spot for most users between really fast and good compression, and very very fast decompression. I use it all the time.

In this case, yes, zstd has faster decompression , but xz decompression speed is quite fast, even before you start using threads. I have no idea why their data found it so slow.

Here's an example of this: https://web.archive.org/web/20231218003530/https://catchchal...

Even on large compressed archives, xz decompression times does not go into minutes - even on a mid range 16 year old intel CPU like being used in this test. Assuming the redhat data on this single rpm is correct, i would bet it's more related to some weird buffering or chunking issue in the rpm's compressor library usage than actual xz decompression times. But nobody seems to have bothered look at why the times seemed ridiculous, at all, they just sort of accepted them as is.

They also based this particular change on the idea that they would get similar compression ratio to xz - they don't, as i showed.

Anyway, my point really wasn't "use xz", but that choosing zstd level 19 is probably the wrong choice no matter what.

There own table, which gives you data on 1 whole rpm, shows that zstd level 15 gave them compression comparable to xz (on that RPM, it's wrong in general), at compression speed similar to xz (also wrong in general, it's much slower than that).

It also showed that level 19 was 3x slower than that for no benefit.

Result: Let's use level 19.

Further the claim "Users that build their packages will experience slightly longer build times." is total nonsense - their own table shows this. If you had an RPM that was 1.6gb, but took 5 minutes to build (not uncommon, even for that size, since it's usually assets of some sort), you are now taking 30 minutes, and spending 24 of it on compression.

Before it took ... 3 minutes to do compression.

Calling this "slightly longer build times" is hilarious at best.

I'll make it concrete: their claim is based on building Firefox and compressing the result, and amusingly, even there it's still wrong. Firefox RPM build times on my machine are about 10-15 minutes. Before it took 3 minutes to compress the RPM. Now it takes 24.

This is not "slightly longer build times". Before it took 30% of the build time to compress the RPM.

Now it takes 24 minutes, or 2.5x the entire build time.

That is many things, but it is not a "slightly longer build time".

I'll just twist the knife a little more:

RPM supports using threading for the compressors, which is quite nice. It even supports basing it on the number of cpus you have set to use for builds. They give examples of how to do it, including for level 19:

  /usr/lib/rpm/macros:
  #                "w19T8.zstdio"  zstd level 19 using 8 threads
  #               "w7T0.zstdio"   zstd level 7 using %{getncpus} threads

The table with this single rpm even tested it with threads!

Despite this - they did not turn on threads in the result...

  /usr/lib/rpm/redhat/macros:%_binary_payload w19.zstdio

So they are doing all this single threaded for no particular reason - as far as i can tell, this is a bug in this well thought out change.

All this to say - i support NPM in being careful about this sort of change, because i've seen what happens when people aren't.

Someone · 2025-01-28T20:15:36 1738095336

> So they are doing all this single threaded for no particular reason - as far as i can tell, this is a bug in this well thought out change

Could be because they want reproducible builds.

DannyBee · 2025-01-28T23:11:29 1738105889

First, here is no data or evidence to suggest this is the case, so not sure why you are trying to make up excuses for them?

Second, zstd is fully deterministic in multithreaded cases. It does not matter what threading you select, it will output byte for byte identical results.

See a direct answer to this question here: https://github.com/facebook/zstd/issues/2079

I believe all of their compressors are similarly deterministic regardless of number of threads, but i admit i have not checked every one of them under all conditions.

If they had questions, they could have, you know, asked, and would have gotten the same answer.

But that just goes back to what i said - it does not appear this change was particularly well thought out.