Years back I came to the conclusion that conda using bzip2 for compression was a big mistake.
Back then if you wanted to use a particular neural network it was meant for a certain version of Tensorflow which expected you to have a certain version of the CUDA libs.
If you had to work with multiple models the "normal" way to do things was use the developer unfriendly [1][2] installers from NVIDIA to install a single version of the libs at a time.
Turned out you could have many versions of CUDA installed as long as you kept them in different directories and set the library path accordingly, it made sense to pack them up for conda and install them together with everything else.
But oh boy was it slow to unpack those bzip2 packages! Since conda had good caching, if you build environments often at all you could be paying more in decompress time than you pay in compression time.
If you were building a new system today you'd probably use zstd since it beats gzip on both speed and compression.
[1] click... click... click...
[2] like they're really going to do something useful with my email address
>But oh boy was it slow to unpack those bzip2 packages! Since conda had good caching, if you build environments often at all you could be paying more in decompress time than you pay in compression time.
For Paper, I'm planning to cache both the wheel archives (so that they're available without recompressing on demand) and unpacked versions (installing into new environments will generally use hard links to the unpacked cache, where possible).
> If you were building a new system today you'd probably use zstd since it beats gzip on both speed and compression.
FWIW, in my testing LZMA is a big win (and I'm sure zstd would be as well, but LZMA has standard library support already). But there are serious roadblocks to adopting a change like that in the Python ecosystem. This sort of idea puts them several layers deep in meta-discussion - see for example https://discuss.python.org/t/pep-777-how-to-re-invent-the-wh... . In general, progress on Python packaging gets stuck in a double-bind: try to change too little and you won't get any buy-in that it's worthwhile, but try to change too much and everyone will freak out about backwards compatibility.
I designed a system which was a lot like uv but written in Python and when I looked at the politics I decided not to go forward with it. (My system also had the problem that it had to be isolated from other Pythons so it would not get its environment trashed, with the ability for software developers to trash their environment I wasn't sure it was a problem that could be 100% solved. uv solved it by not being written in Python. Genius!)
Yes, well - if I still had reason to care about the politics I'd be in much the same position, I'm sure. As is, I'm going to just make the thing, write about it, and see who likes it.
Back then if you wanted to use a particular neural network it was meant for a certain version of Tensorflow which expected you to have a certain version of the CUDA libs.
If you had to work with multiple models the "normal" way to do things was use the developer unfriendly [1][2] installers from NVIDIA to install a single version of the libs at a time.
Turned out you could have many versions of CUDA installed as long as you kept them in different directories and set the library path accordingly, it made sense to pack them up for conda and install them together with everything else.
But oh boy was it slow to unpack those bzip2 packages! Since conda had good caching, if you build environments often at all you could be paying more in decompress time than you pay in compression time.
If you were building a new system today you'd probably use zstd since it beats gzip on both speed and compression.
[1] click... click... click...
[2] like they're really going to do something useful with my email address