Distros, please stop screwing over Python packaging. It is incredible that Debian/Ubuntu pick Python apart and put modules into different packages. They even create a fake venv command that tells you to please install python-venv.
What they should just do is offer a bunch of packages like python3.7, python3.8 that install the official python package wholesale into /usr/python or someplace and then symlink one of them to `python`.
If I would get to redesign package management (both for Linux distros and for languages), I would have one package manager that installs everything into a global package cache, and then pick the correct version of libraries at run time (for Python: at import time). Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable. Instead, make it easy to get bleeding edge versions into the repositories. They can be installed side by side and only picked up by the things that actually use them.
The problem arises when non-Python packages depend on Python modules.
> If I would get to redesign package management (both for Linux distros and for languages), I would have one package manager that installs everything into a global package cache, and then pick the correct version of libraries at run time (for Python: at import time). Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable. Instead, make it easy to get bleeding edge versions into the repositories. They can be installed side by side and only picked up by the things that actually use them.
You may want to check out Guix and Nix - their approach is pretty close to what you're describing.
A common solution to this is if you still want to run traditional distros is to just run "bare infra" (whatever that means) on the host OS and everything else in containers or Nix.
> Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time.
I think this requirement made sense when disk space was scarce.
I think this requirement makes sense if you trust that your distro is always better at choosing the 'best' version of a dependency that some software should use than the software author.
Nowadays, I think neither is generally true. Disk is plentiful, distro packages are almost always far more out of date than the software's original source, and allowing authors to ship software with specific pinned dependency versions reduces bugs caused by dependency changes and makes providing support for software (and especially reproducing end-user's issues) significantly easier.
Isolating dependencies by application, with linking to avoid outright duplication of identical versions (a la pnpm's approach for JS: https://pnpm.io/) is the way to go I think. Honestly, it feels like the way it's already gone, and it's just that the distros are fighting it tooth & nail whilst that approach takes over regardless.
Ah JS, how many days has it been since the last weekly "compromised npm package infecting everything" problem? If you are upholding that as gold standard you have to be the worlds laziest black hat.
> Disk is plentiful,
I recently had to install a chrome snap because it is the new IE6 and everyone is all over chrome exclusive APIs as if they were the new ActiveX. Over a gigabyte of dependencies for one application and given the trend of browser based desktop applications? I would like to have space left for my data after installing the programs I need for work.
Distros assume responsibility for fixing major bugs and security vulnerabilities in the packages they ship. Old versions often contain bugs and vulnerabilities that new versions don't. Distros have two choices here: either ship the new version and remove the old version, or backport the fix to the old version.
Continuing to ship the old version without the fix is not an option -- even if you also ship the new version -- because some programs will inevitably use the old version and then the distro will be on the hook for any resulting hacks. Backporting every fix to every version that ever shipped is also not a realistic project.
Here in the startup world we often forget that there's a whole other market where many people would gladly accept 3-year-old versions in exchange for a guarantee of security fixes for 5-10 years. Someone needs to cater to this market, and the (non-rolling) distros perform that thankless task because individual developers won't.
> Distros assume responsibility for fixing major bugs and security vulnerabilities in the packages they ship.
I think they should just ship Python programs, not libraries. They could check what libraries given Python program uses are safe in the version that it uses them.
And just don't care if each of Python programs has a separate copy of the libraries or if particular version of particular library is shared between Python programs by Python environment.
Distributions might just give up responsibility for sharing Python packages between Python programs without giving up the responsibility for security of those programs.
Why not? Its cheap resource wise whereas dependency hell is potentially debilitating. For some reason many proponents of the package management status quo are blind to this. Having multiple versions of a dependency is only bad in so far as its "messy". It isn't objectively bad. But having a system that breaks applications because two or more cant agree on a package version is objectively bad. Its arguing aesthetics versus getting the job done. A poor position.
Windows, for all its faults, doesn't have this problem. It will happily accommodate multiple versions of say, .Net as needed.
Disk is cheap. RAM is cheap. Man-hours are not. Distros are maintained by people, who are often volunteers. You are asking them to do extra work (i.e. porting the same patch to multiple versions of the same library) so that someone else can have it easy. But why should they? Why not the other way around?
It's not just aesthetics. If a new vulnerability is found in, say, libjpeg, then every Windows app that uses libjpeg needs to be updated separately. Tough luck if your favorite image manipulation tool takes a while to catch up. On the Linux side of the fence, the distro patches libjpeg and every app is automatically fixed. This is a huge win in terms of protecting the user. Why should we give up that benefit just because some developer wants to use his own special snowflake version of a library?
Not managing that would require less work not more. Their position is making more work for themselves. The point was to prevent dependency hell through matching the wrong package versions. Something that can occasionally happen in Windows. The problem is, the current form of management causes the far worse form of dependency hell of applications requiring conflicting versions.
I have maybe had Windows get confused about dependency versions twice ever and both times it was a driver inf for a virtual device. I will grant that fixing the problem required a fair amount of work by Windows standards but frankly not all that much by the standards of some of the more hands on distros.
I have had Linux tell me I can't install an application because it wanted a different version of Lib-whatever than what something else wants many many times.
"Why should we give up that benefit just because some developer wants to use his own special snowflake version of a library?"
Odd that you claim major distros are built by a small group of volunteers but the maintainers of much smaller and less well supported applications need to suck it up and use whatever version the Distro maintainers decide on.
Most major distros are not volunteer run and haven't been for ages. Ubuntu, RHEL, SUSE, POP, the list goes on. These are commercial products with full time paid developers. In the case of Ubuntu they are providing a major chunk of the work back upstream to Debian and in the case of RHEL they are the upstream. Most minor distros are downstream benefactors of the big players.
Contrary to that its still common for many FOSS apps and utilities to be one man jobs. Maybe the guy doesn't have the resources to keep up with the break neck pace of some update cycles. What if they decided to go with an LTS build intentionally? What if its a simple package that doesn't have security issues yet gets updated for other reasons? What if the version they are using has core functionality that was EOL'd in a newer release so they can't move on without major rework that they can't manage?
There are a million reasons for why a project may want to stick with an older version. Also allowing for the ability to update all packages does not require draconian control of which packages can be installed. This notion runs against the whole notion of user control. If the user wants multiple concurrent versions on their system who are you to say they can't? FOSS means freedom.
You should patch two dozen python programs as a whole that use vulnerable version of libraries. Treat Python programs as if each of them was a single executable file and you only have information which versions of libraries it has inside it. And if any of those is known to be vulnerable treat whole program as a security threat.
What makes you think so? SSDs aren't exactly stellar in the cost-per-TB department, as will be the case with each new higher-performance storage technology. Plenty of people cannot afford the prices of new Western tech either, what about them?
> SSDs aren't exactly stellar in the cost-per-TB department
First of all, 1TB for binaries and libraries may as well be infinite. Secondly, you can get a 1TB SSD for under $100, which is pretty damned inexpensive when you consider it took until 2009 to get HDDs that affordable.
It's plentiful relative to the size of compiled or source code. E.g the biggest .so file on this system right now is a <150MB libxul.so. That's only used by one piece of software anyway, and the drop-off is pretty steep after that. A 64GB drive (tiny these days) can fit more than four hundred of that unusually large file.
Not if they pull in all of their dependencies, PyQt would have a complete copy of all Qt binaries and a complete chrome install because of course Qt includes a browser based html viewer. Python packages are gigabytes.
What distro is pulling PyQt as a dependency of Python? There is a difference between "dependency" and "every package which has the word python in the description".
PyQt only contains the bindings. You share the same Qt environment across your system (hence qmake needs to be in your path). The python package itself is not that big (~10 MB).
The comment on top of this chain is about letting every package specify its own versioned dependencies. So how would that global version work out when python needs 5.1 and some other software specifies 5.2?
That guarantee only applies to Qt itself, I would expect that the newer Qt binary was also compiled against all the other newest versions of its own dependencies. Good luck finding a backwards compatibility promise for all of them.
> Unless you are trying to save a buck it seems 1TB is the standard today.
I suppose part of the problem is that while getting a 1 TB SSD instead of a 512 GB or even 256 GB one may not be overly expensive (for a middle-class person in a wealthy country anyway), due to the way OEM laptop product lines are often stratified, you may need to either buy your 1 TB SSD separately or get an altogether higher-specced model than your perhaps otherwise would. The latter especially isn't cheap.
There might be some customization options but sometimes little customization is available. That's probably one of the ways people end up with relatively small-capacity SSDs.
It's kind of similar as with RAM: a higher capacity isn't that much more expensive in theory, but in practice it may be.
This is irrelevant for custom-build desktops but lots of people are running only laptops nowadays. I'd like to see better customizability for the builds, as well as upgradability and replaceability, but the options are often limited.
> Unless you are trying to save a buck it seems 1TB is the standard today.
Buying a 1TB external SSD would more than double the cost of a raspberry pi 4 that and my ancient beagle board does fine running from 32 GB.
> My primary desktop has 4.
Those are rookie numbers for a primary system. Of course my Office system next to it is a lot lower specked with the test system next to it even lower.
Embedded systems really shouldn't be brought into play here but even then a 256 GB uSD card for the Pi is $25 dollars and itself far overkill. My entire primary desktop OS, firmware, DEs, and very extensive package set fits in 15 GB. Multiplying my primary system by 10 and sticking it on a Pi taking $25 of storage with plenty to spare is still not an argument against binary sizes, especially since there are niche distro spins used for that niche space anyways.
The eMMC on a Beaglebone Black is 4GB. Sure you can boot off an SD card but that's less robust (though, I guess you can use the SD card for all your virtualenvs...).
> I think this requirement made sense when disk space was scarce.
No, the main reason is security. I need a distro to guarantee me that the library that I use are going to stay the same for the next 3 to 5 years, while also receiving small targeted patches.
> I think this requirement makes sense if you trust that your distro is always better at choosing the 'best' version of a dependency that some software should use than the software author.
No, it just has to be better at choosing version than picking them randomly using pip.
Furthermore, when thousands of developers use the same combination of libraries from a distro the stack receives a ton of testing.
The same cannot be said when each developer pick a random set of versions.
One case where disk space is somewhat scarce is on shared academic computing clusters (which often provide many versions of things via some module system, but your $HOME can have a quota that's just 30GB).
Homebrew does this very well with its "cellar" system. Every version of every package gets installed to its own root tree, eg `/usr/local/Cellar/python/3.9.7/`. The currently-active version is then symlinked into `/usr/local/opt/python` and from there into `/usr/local`.
I'll remember that "Homebrew does this very well" the next time I have to fix a bunch of shit because it has updated the currently-active version, or removed this or that bugfix release, as part of a general upgrade. After the third time this happened, I started using pyenv - which is another mountain of brokenness, I grant you, but at least I have some degree of control on what happens when.
pyenv is pretty good for working around this problem. I've recently switched to asdf-vm which I like even more, since it handles versions for multiple languages and tools.
I have been meaning to try out ASDF-VM. Currently my shell initialization script has at least 4 of these "version managers". While I don't really mind them (and they are mostly well-behaved), it might be nice to have something a bit more centralized.
In addition to python version issues, I was also running into JVM and gradle version compatibility issues, which I was handling with jenv and some aliases that would swap JAVAHOME environment variable as needed. asdf-vm cleaned all that up in a very clean way, and I like the way you can set .tool-versions file for a project and share with other asdf-vm users.
hands down the best way to manage python and its packages.
Agreed, especially on Windows.
It just works.
This is pushing it. It's not hard to break conda or put yourself in situation where the updater/dependency checker gets stuck and doesn't know what to do, especially once you start adding conda-forge packages. But it does do a better job than anything else I've tried (although poetry + pyenv on Linux is getting much better)
FWIW, we are soon going to be releasing a much faster depedency resolver. We are also thinking hard about how best to address the "growing ecosystem" problem, in a future-proof way.
IIRC GoboLinux was the first distro to do things this way. Sadly, it didn't catch on and the Linux world doubled down on labor intensive volunteer package maintenance.
Great shout out. I still ought to try using it one of these days! It seems like a good option for people who want a better file system hierarchy without the extra complexity of Nix/Guix.
Homebrew does this kinda poorly compared to Nix and Guix. They are a different breed.
For starters, there is no /usr/local symlinking process. It's also possible to have multiple versions of e.g. python installed and active. Homebrew is like a poor-man's Nix.
I hate that Homebrew uses /usr/local. At least on M1 they had to move it to /opt but I always install it to my home directory in ~/.brew. I can override the paths and not have to worry about file/directory permissions.
This is the real legacy problem: Python comes from a world, where only one version of one packaged seemed the right way to do.
I do not have a good idea, but other ecosystems evolved much more sane in the realm of packaging. While not ideal, Go has done a fairly good job - and the "module" operations are instant - which they should be.
Ok but Go compiles statically, while you can do the same with pyinstaller, I don't think that's really comparable as we're talking about deployment right there.
Static binaries are a different story. Go has dependencies as any other modern language and they had a bad story in the past and have a better story today.
Sketch for python: Create a ~/.cache/python/packages directory. Manage all dependencies there. Make the python interpreter "package aware" so that required dependencies are read off a file from the current project (e.g. "py.mod") and adjust "system path" accordingly and transparently. Or something along those lines.
No extra tool, a single location, an easy to explain workflow (add a py.mod file, add deps there with versions, etc).
I'm just thinking out loud, but it does not need to be hard.
Despite the downvotes, the argument stands: linux distributions are having a hard time handle the amount of tiny libraries and the conflicts in versioning and many maintainers voiced their concerns in the past years.
The point echoed in this discussion multiple times is that distributions should not handle the tiny python libraries and attempt to solve the dependency version issues, but treat an application with all its dependencies included as a single package. If a dependency needs to be bumped a version for e.g. security purposes, then the app obviously wasn't tested with the new version (which didn't exist at the time) and needs to be retested, repackaged and rereleased for the update. This would cut down on the number of packages to be maintained, as the vast majority of python libraries would be exclude from the direct packaging process.
> If a dependency needs to be bumped a version for e.g. security purposes, then the app obviously wasn't tested with the new version (which didn't exist at the time) and needs to be retested
The burden of updating multiple copies of the same library across many packages grows exponentially and is simply untenable for distributions.
If you can find an army of volunteers to do that, distributions would love their contributions.
This hasn't happened in the last 20 years. I'd love to be proven wrong.
Since simply updating the dependency can easily break the resulting package, this need to re-test is not something that can be avoided by making some other choice of packaging e.g. the current one - it's not adding a new burden, it's acknowledging that it already exists (indeed, IMHO much of what the original article complains about). If there are no resources to carry that burden, then the only option seems to be to wait for an updated release from the upstream, whenever that arrives.
I wrote "The burden of updating". Testing still needs to be done but there's a lot of automation to minimize the workload.
> If there are no resources to carry that burden, then the only option seems to be to wait for an updated release from the upstream, whenever that arrives.
No, most upstreams do not backport security fixes. And switching to a newer release is not an option if you want to provide stability to users.
That sounds similar to what I do in macOS. I hate installing homebrew to /usr/local so I started installing it to ~/.brew and I hate using the python from homebrew so I always use pyenv.
This behavior is why things like Snaps and Flatpacks have become so popular. Package managers operate under a draconian and outdated mindset that gets in the way more than it helps at this stage.
You can both allow different versions of the same packages to coexist while also managing updates and installation/removal of software. It doesn't have to be this way. Software should be able to ship with its dependencies included and work and not rely on the whims of the OS getting it right.
> Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time.
I’m not exactly sure how it works but I think I’ve heard that newer releases of Enterprise Linux (EL8+) support multiple channels of the same package or something similar.
Interesting idea, we should be able to hook before `sys.modules` cache or make on of such cache for each module in Python then we should be able to produce this.
However, I thought the point was helping distro package management, which, to my knowledge, is not really built to support multiple installed versions of a package at the time: `dnf upgrade` for example, will upgrade all single instances of each packages to their newer release.
Actually you can already override __import__ and implement this, but then you still need an installer, and distro support for multiple instances of a same package.
Actually, explaining how to get Python working on Windows is far, far easier than on either linux (modulo various distros) or the Mac. That's because there is one obvious distribution of Python to use, the official one, and new versions of it are always consistent and play well together.
Yes you can use Anaconda if you want, and people who do that are probably data scientists or something and know what they want to do and why. It's well documented and has it's own robust ecosystem.
I say this as someone who's been on Macs at home since 2007 and works professionally on Linux, but I started with Python on Windows back in 2002.
Unfortunatly windows has plenty of problems too. First, the system PATH will kick you if of you have more than one python installed, so the official installer does not add to it by default, hence the python command doesn't work after an install. Instead, you get the py launcher, but it's not provided if you installed python from the app store or with anaconda.
". . . and then he installed cygwin, and decided to manage and run python through the bash environment. . . " (fun fact: the git client's bash shell is actually cygwin. Also; MobaXTerm has cygwin bundled-in as well).
Sorry, but to be frank, I think you're a bit ignorant here. Let me explain, starting from the bottom:
> I would have one package manager that installs everything into a global package cache, […]
There is exactly one package manager. If you're on Debian or Ubuntu, it's dpkg. If you're on RedHat, it's rpm. If you're on Arch, it's pacman. Yes, some of the BSDs have two (base packages + ports tree), but they're the odd ones out.
pip, cargo, go*, etc. are not the same thing. I know they're called that but they don't perform the same function: none of them can create a working system installation. Let's call them module managers to have a distinct label.
> and then pick the correct version of libraries at run time (for Python: at import time).
That's easy for Python, and incredibly hard for a whole lot of other things. A module manager can do that. A package manager needs to work for a variety of code and ecosystems. Could it try to do it where possible? Maybe. But then the behavior is not uniform and made harder for users to understand. Could it still be worth it? Sure. But not obviously so.
I would also say that this is just giving up on trying to keep a reasonable ecosystem. It's not impossible to reduce the dependency hell that some things have devolved into. It just needs interest in doing so, and discipline while effecting it. I'd really prefer not giving up on this.
> Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable.
This is to some degree why distros are breaking apart Python. Some bits are easy to install in parallel, some aren't. There can only be one "python". Worse, there can only be one "libpython3.9.so.1.0".
> Distros, please stop screwing over Python packaging. It is incredible that Debian/Ubuntu pick Python apart and put modules into different packages. They even create a fake venv command that tells you to please install python-venv.
They're trying to achieve the very goals you're describing. Trying to give you a working python without having to download and "install" some weird thing somewhere else. And at the same time trying to keep the module managers working when they're replacing some module but not all of them.
On a subjective level, it's obvious you have a strong distaste for this ("they even create a fake") — but could you please make objective arguments how and why this breaks things? If you're getting an incomplete Python installation, that seems like a packaging bug the distro needs to fix. Is that it? Or are there other issues?
> If I would get to redesign package management (both for Linux distros and for languages),
And, I'm sorry to say this, but your post does not convey to me the existence of any essential C codebase packaging knowledge on your end. I don't know about other ecosystems, but I have done packaging work on C codebases (with Python bindings no less), and you don't seem to be aware of very basic issues like header/library mismatches and runtime portability.
If you are interested in this topic, please familiarize yourself with the world of existing package managers, the problems they run into, and how they solve them. There's a lot to learn there, and they're quite distinct from each other on some fronts too. Some problems are still unsolved even.
What they should just do is offer a bunch of packages like python3.7, python3.8 that install the official python package wholesale into /usr/python or someplace and then symlink one of them to `python`.
If I would get to redesign package management (both for Linux distros and for languages), I would have one package manager that installs everything into a global package cache, and then pick the correct version of libraries at run time (for Python: at import time). Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable. Instead, make it easy to get bleeding edge versions into the repositories. They can be installed side by side and only picked up by the things that actually use them.