Python libraries shipped by distributions are so old that this mechanism is mostly useless for python development.
This also applies to many others programming languages which have their own packaging systems.
While python packaging is indeed messy the needs of traditional linux distributions are by far the least important. Python packaging needs to better serve the needs of python developers, not those of sysadmins.
Having a "linux distro" interpose itself between you and your libraries is a fundamentally broken system not worth fixing. All decisions related to dependency choices fundamentally belongs with upstream.
Perhaps the distributions would be more inclined to include up-to-date version if the standard in the python community was not to break everything all the time.
There are distributions that keep up-to-date, though, e.g., archlinux.
Serving the needs of python developers vs. sysadmins is a false dichotomy. Python developers develop on a system that they need to admin. One great thing about linux is that everything on a system can be kept up-to-date using just one software tool (a package manager). You are not going to convince me in a million years that this is a bad idea.
Now, for things that are needed in development you sometimes need different versions, in particular if you happen to have an OS that ships very old versions. For that purpose there should be some sort of tool and not a gazillion tools that are all incompatible and behave in slightly (or not so slightly) different ways. 'python -m venv' is different from 'virtualenv'? WTF?
Also, if your distribution is up-to-date (e.g., archlinux) and you are pinning to older version there is also a problem. As the article puts it 'pin their dependencies to 10 versions and 6 vulnerabilities ago'. And if you actually want to maintain your software in the future you, at some point, have to go to the new version anyway.
>Serving the needs of python developers vs. sysadmins is a false dichotomy. Python developers develop on a system that they need to admin.
I couldn't disagree more. Even if the same person is admining a system they develop on, they almost certainly aren't going to admin the systems their users deploy on. The admin role and developer role should be completely separate with different goals and requirements.
The system Python should only be used for system scripts. The end. Nowadays ideally it shouldn't even be visible to non-admin user accounts, and users certainly shouldn't be installing packages into the python being used by system scripts, not even using virtual envs.
The python your system uses and how it's configured should be decided by the distro. If they want to break it up into weird packages and funky paths, whatever. That's their problem because they and maybe (_maybe_) system admins should be the only people using it.
As a developer you should be making your own decision about which version of Python you are using, what modules get installed into it, what venvs you have and how you manage them. This should all be considered with one eye firmly fixed on the target deployment environment and how your Python will be configured there. The right answer, of course, being that you should be packaging the required python along with the application.
This is exactly the approach RHEL has taken with version 8. Typing ‘python’ results in ‘command not found’. All the system tools that use python are set to use a custom path dedicated for those tools.
You may not be able to completely, but basically if users can write python scripts and run them using system python that's a side effect. It's purpose is to run system scripts, and any other use must not interfere or compete with that in any way.
To be hones that wasn't always the case. In the early days Python was provided on these systems for users just as much as administrators and in the early days there weren't any system script as part of the OS that used it. However now there are, and the packaging and configuration of the system python has in some cases even been mangled somewhat to suit the needs of the distro. It's time to make a clean split between system python and user/developer python.
Don't put the system python in $PATH. Any command that needs the system python can be launched through a shim/wrapper script that sets the PATH. (Or even better to execute python directly to avoid passing the system python in PATH.)
> Perhaps the distributions would be more inclined to include up-to-date version if the standard in the python community was not to break everything all the time.
Yep... Perl is a godsend... take a code from 20 years ago, run it on a modern system, and everything works.
Python? Three different software versions need three different versions of the same library, and new library versions are not backwards compatible with their older versions... Even Python 2.x -> 3.x was a pain in the ass too, and even within minor versions, you sometimes get breaking changes.
> Perl is a godsend... take a code from 20 years ago, run it on a modern system, and everything works.
That's more or less like "take a VB6 binary from 20 years ago, run it on some modern Windows, and everything works" - that's just because the ecosystem is effectively dead, so supporting it on new releases just means carrying over some stuff that worked 20 years ago.
But it works... it's universally supported on pretty much every os, even by default on most *nix based ones, and new versions and new CPAN modules are still written.
In 10 years, with python 4, or even maybe 5, everything will still be broken, and you'll still be claiming perl is dead, and i'll still be using the same stuff I use now, that worked years ago, works now, and will work then.
And people will still be shipping code faster and more efficiently than they would do in Perl. Because in the end, the advantages of using Python, in terms of readability and productivity, easily offset a bit of packaging pain - whereas the disadvantages of using Perl don't offset whatever marginal gain you get by using old infrastructure.
And people will still be shipping code faster and more efficiently than they would do in Perl.
I feel that quite a bit of software written these days is designed to have a short shelf life. I wonder how many of the "Show HN" posts will work, or even be useful, in five years? How much will be maintainable?
I've been thinking a lot about Matthew Crawford's writings on mechanical things[0] and how they might apply to my own craft as a software developer. The work, as described generally on HN, is still all about moving fast and breaking things. It doesn't matter if the Python code I crank out today works in five years (or is maintainable in five years) -- I just need to get my product to market. Some of us work in other domains where sustainability and repair-ability are important: if my software is going to be in the field, in users' hands, for ten years, I need to consider how reasonable it might be fix bugs. If I have to fight just to get the software to run, I've already lost that battle. In these cases, the dependability of something like Perl is really great. The complaint against Python here isn't about the language itself (I think many people would agree it is quite nice to use), but rather the larger ecosystem, which makes it very hard to maintain software over the long haul.
[0] See Shop Class as Soulcraft, The World Beyond Your Head, and Why We Drive.
> I feel that quite a bit of software written these days is designed to have a short shelf life. I wonder how many of the "Show HN" posts will work, or even be useful, in five years? How much will be maintainable?
Yep, all the python2 code from 5 years ago doesn't work anymore... It's sad how getting python2 to work on newish distros is becoming a great pain in the ass.
HN has been around for at least 5 years, right? Probably some interesting stats on how many 'Show HN's are live, dead or decomposed. Live: updated in the last year Dead: not updated in the last year, but still available. Decomposed: not findable/available.
But that code won't work in five years, because all the distros will remove python3 in favour of python4. (also, the code from 5 years ago written for python 2.x doesn't work now, becauses distros removed python2)
Do you really want to rewrite all your stuff every 5 years?
As an end user, I care more about code quality and stability than I do speed of shipping code. I don't want to use some buggy code that was pushed through production too quickly just because the devs are lazy!
Yep... if a thing works, don't fix it.... Modern devs "fix it" until it's broken. Just look at google and their communication platforms... every few years a new one that kills off the previous one, with practically no added value.
There is a lot of software, from 7zip, total commander, putty, vlc, windirstat, etc., that do one (or few things), people have been using them for decades, and pretty much all the features have been there for that long, without an artificial need to "ship something new, fast" every few weeks.
Or instead use another modern language? I avoid any python that I can’t apt-get/brew and haven’t considered writing anything with it for a very long time. There is always go/ruby/c# which seem to generally avoid the problems discussed here
This is the sad reality of modern development... new project? Why take some stable tech, when you can take a 2month old framework and an alpha version of a library, and do two rewrites, before the project fails due to breaking changes and abandoned software. It seems as if people actively avoid anything stable.
use Lingua::Romana::Perligata;
adnota Illud Cribrum Eratothenis
maximum tum val inquementum tum biguttam tum stadium egresso scribe.
da meo maximo vestibulo perlegementum.
maximum comementum tum novumversum egresso scribe.
meis listis conscribementa II tum maximum da.
dum damentum nexto listis decapitamentum fac
sic
lista sic hoc tum nextum recidementum cis vannementa listis da.
dictum sic deinde cis tum biguttam tum stadium tum cum nextum
comementum tum novumversum scribe egresso.
cis
That's a really weak gotcha: Perl 4 was deprecated 28 years ago and Perl 6 is a completely different language that's been called Raku since 2019. There are no dependencies to resolve between them. It's like worry about dependency resolution between Javascript and Java. They're not even the same platform.
True. But Raku (https://raku.org #rakulang) does have an Inline::Perl5 module, which allows you to use 99.9% of modules on CPAN (basically, only the ones using source-filters, and ones that are really, really deep into the Perl internals). So there *can* be dependencies between Perl and Raku.
I'm sorry, doesn't pip already works there? Otherwise, there's pyinstaller which is great but it requires an entrypoint so it won't do with a pure lib that doesn't expose any script at all.
Instead of assigning blame we should interpret the tension as a sign of an important need having no good solution. It's quite challenging to overlay packaging/versionning systems .. but theoretically it's an interesting and general question. Maybe people should model and solve the whole issue (if there's some solution at all; which I hope)
The fundamental problem is the difference between Curated and Uncurated packaging systems:
- Linux dictros have curated packaging, often outdated but stable.
- PyPi, NPM, and others are Uncurated, anyone can publish, could be unstable, could be insecure.
It's down to the developer to decide which route they want to take, and at the moment most want to move quickly with the latest tools. To do that you have to go the uncurated route. It only becomes an issue for a developer if their software is published by a Linux package management system, but 99.9% of developers will never have that.
> The fundamental problem is the difference between Curated and Uncurated packaging systems
Exactly this! It is expected from linux distros that they have curated packaging. I think that is good and I really expect it to stay that way.
Whether you want curated or uncurated packages depends on the use case.
As the user of a program, I definitely want curated packages.
As the developer of a program, I want to specify myself which version of a dependency I want to develop against and I don't want to be hindered by the linux distro in doing that. I do think that developers in that context are not always supported that well in linux distros. And on the other hand, I do think that tools supported by the programming language can assist in that scenario. (for example installing multiple versions of the compiler and runtime in the users home directory and being able to easily switch between those versions on a project basis...)
> Python packaging needs to better serve the needs of python developers, not those of sysadmins.
Please substitute s/sysadmins/users/, and realise your developers are users too.
I've been doing Python for almost 15 years; and I'm getting really fed up with some things. Packaging is a mess. Distribution is a mess (for servers/IoT - Docker saves the day; for desktop - I feel like giving up). Managing the installations is a mess; upgrades can be impossible - I'm hard-stuck on Python3.6 on one project!
I find myself rewriting many smaller tools in Go or Rust, just because I can upgrade the toolchain at any time, and/or ship a static binary. But Rust has a very high barrier to entry, and Go tends to be simplistic.
I'd fully jump ship today, but Python has just too much momentum behind it.
I agree to some extent, but I believe they were mostly trying to convey that much of the work is done by volunteers who are largely only motivated/have the resources to test a small subset of potential deployment scenarios. Those contributions are still valuable and contributors themselves don't owe you anything. The best way to fix this issue would be to get involved (or switch languages as you suggested, that's fine too).
Language specific package managers are the antithesis of package management. There is no "management" in "install multiple versions of stuff in this 10 levels deep dependency tree". Upstreams can pick their dependencies but at the same time cannot control what packages depend on them and make it into your app and neither can you. That's a job for your distro.
There are different cadences and roles in play here.
Cadences:
Distribution - e.g. Ubuntu 21.10, 22.04; RHEL 8.4, 8.5, ..., 9
Library - e.g. simplejson 3.17.4, 3.17.5, ..., 3.18.0
Language - e.g. Python 3.8.0, 3.9.0, 3.10.0 (This makes Python particularly annoying because for a C project it shouldn't matter if you used gcc 8 or gcc 9 when producing binaries, but with Python it very much matters which version of Python you run with)
Application - e.g. Gimp 2.8.22, ..., 2.9.0
Roles:
As a sysadmin you want to use the distro package manager to install tools for administering your system so that you can connect to wifi, monitor resources, etc.
As an application developer you want to talk to the language specific package manager so that you can use the latest versions of libraries so that bugs are fixed, and you're not shackled to people using RHEL 6 or 7.
As a Debian/Fedora packager you want to talk to the distro because that is required if you want to submit an application upstream to support users who want everything in the distro package manager.
As an application packager you may want to talk to the distro because of the above, but you could also target flatpak (or snap) so that you can use all the latest libraries without worrying about packaging for slow moving distributions.
As an end user you want to use the distro package manager because that's your embedded mental model and workflow. But you should definitely consider using flatpak (or snap if that floats your boat) so that you can use the release stream wherein the libraries are unpinned from the underlying distribution's cadence.
As an end user you should never need to deal with the language specific package manager to install applications or libraries. If you need to install using the language package manager (npm, gems, rocks, cargo, pip) then the application has not really been 'distributed' or 'packaged' imo. If you're doing this, you're off-piste trying out new stuff.
There are more details and a great conversation that can flow from this, but language specific package managers are not the antithesis of package management and supplement it; just not for the 'end user' role. Well, yes for the 'end user' role, just not directly.
Wouldn't be better if the python programmers did a bit of planning ? Or are the things so boring that every python minor version must be incompatible with other python minor versions.
Features get added. So you can't use a match statement in 3.9 but you can in 3.10. If you write a library you need to be conservative in which features you use or applications can't use you.
Why not just upgrade applications ASAP? Sometimes things are removed over three or so versions. So you can't move from e.g. 3.9 to 3.10 without making sure you're importing types from the correct location:
> Aliases to Abstract Base Classes in the collections module, like collections.Mapping alias to collections.abc.Mapping, are kept for one last release for backward compatibility. They will be removed from Python 3.10.
These things are planned. It's just much much harder to handle this because you need the interpreter at runtime while in compiled languages you just use `-std=C99` and you can compile the code and then link.
What happens if your project needs an update, but it will hose the OS? Do you expect someone working in Python in Windows who wants to distribute a datetime package should package for RPM and APT?
I used to think OS package managers should be the end-all be-all, but the use cases for OS package managers are very different from language runtimes. While different Linux distros were fighting between themselves, they completely ignored use cases for projects and language runtimes. Sadly, the result is a mess for everyone.
I think people need to figure out what approach works best for them. I'm of the opinion that any core technology to your business needs to be decoupled from the OS. It makes OS updates too messy and you're tethered to whatever the OS supports. My field uses a lot of Python and every company figured out quickly they need to run their own binaries in addition to packages.
At one company, they packaged up custom RPMs. It wouldn't be a problem to package up and distribute Python libraries. Others had their own package system (no OS or runtime fit their needs). It seems like most people use something like virtualenv.
Regrettably, this means there's no easy answer for people new to Python and the right choice will probably change as you grow. But I really think the answer is Python should come up with something that works for Python and let OSes do their thing.
This was 10+ years ago, but I remember something like the installed Python package had a caching bug that was affecting production. The update wasn't compatible with some OS scripts so the machine would no longer boot using the updated package. Problems like that came up fairly often, but I remember that was the most egregious.
Another scenario is that the OS shipped with Python 2.5 (supported until May 2011), but we had third-party tools that required Python 2.7 (shipped July 2010). Switching OSes (where things like monitor or hardware drivers weren't yet supported) was a ridiculous pain to test and certify. Decoupling OS and Python+package versioning was a huge relief for everyone, but won't make sense for everyone.
> Absolutely not. I expect a serious user of that library on an apt based system to package it and submit the package to their distro.
I try my best to personally do this and push for a work culture that does this, but even if this was done I can't fathom waiting on an OS update for existing code to percolate down. The risk tolerance, scope of concern, and agility between an OS and whatever project pays the bills are very different.
> All decisions related to dependency choices fundamentally belongs with upstream.
No. As a user I want dependency management (and all of software distribution, to be honest) to be handled by the party that's best able to keep things working while at the same time keeping them secure. Linux distributions have a much, much better track record at that than most upstreams.
I really doubt that the python libraries packaged by Debian are any more secure or stable than the latest release of those libraries. At best they just limit to breaking updates to once every few years when they update them.
It’s essentially like version locking packages except some random Debian maintainer decides when it’s time to update.
> I really doubt that the python libraries packaged by Debian are any more secure or stable than the latest release of those libraries.
They are more stable because I can keep using the same version for two years, and I'm not being pushed to the latest version that has (intentional or unintentional) breaking changes every two months. Yes, there might be a bug or two in there that have since been fixed, but I very much prefer the failure I know over unexpected failures.
They are secure because Debian (and distros like it) backport security fixes to their packages. You can argue about whether they do a good enough job keeping up with vulnerabilities, but at least I know that once I install the update from Debian, my machine is secure, and I don't have to wait for the upstream authors of all software on my machine to release updates that upgrade their dependency.
> It’s essentially like version locking packages except some random Debian maintainer decides when it’s time to update.
Yes, but version locking isn't my problem. The crucial difference is that distros pick a version and support those for years, while upstreams usually force you to use the latest version all the time to get security support. With distros _I_ get to decide when I upgrade, and the reduced frequency is a nice bonus. Having a single entity for all software on the system is also valuable, as there's just one tool to learn and one place to check for updates.
That is not "provably more secure or stable". I think it's pretty safe to assume that maintainers backporting security fixes is more secure than just not updating at all, but even that isn't proven. It being more secure than updating is much more questionable, and is probably going to vary greatly between packages.
Not only Debian, but Red Hat, Suse, Canonical and others provide both free and paid security updates and many large companies are happy to pay quite a pretty penny for that.
Yes, exactly this. Only I view it from the other way round: to try to do development with distro-supplied language packages is a category error.
Distro-supplied interpreters and their associated libraries are there for the applications supplied and supported by the distribution. Unless you are developing something to be part of the distro, they are not for you.
Do not let the distro get between you and your libraries. Supply your own.
>Unless you are developing something to be part of the distro, they are not for you.
. . . forgetting the original purpose and mindset behind Linux in the first place. From a 1990's point of view; it was always intended to be a hobbyist OS; and in most cases, one had to compile all the binaries and kernel one's self.
There was no such thing, really, as a "user" or "admin" - everyone was considered, and expected to be a developer.
Yes, and distributions arose to solve the problem that plagued early Linux: how to have a set of applications and libraries with consistent and mutually compatible versions of everything. If you're compiling everything yourself, building it so it all works once is an achievement. Keeping on top of as many moving targets as there are binaries in the system so that everything keeps working over time is not practical once the number of moving parts gets high enough.
Distros solve a real problem, but the trade-off is that some parts of the system must exist to serve itself.
Exactly agreed on that.
I don't understand what value distribution is providing by repackaging python libs, they re always way too old to be usable, and they re global while I work on many projects, with their own incompatible requirements.
Maybe I am dumb, but I exclusively use virtualenv and pip..
> I don't understand what value distribution is providing by repackaging python libs...
I want to easily and safely use some app my distribution ships. I want to receive security updates automatically for all such apps. I don't care what language it's written in or what its dependencies are.
These app packages provided by the distribution have dependencies that are also packaged by the distribution so that dependency resolution works.
Since the point of a distribution is that it can run apps, the value is that a distribution works at all.
I agree with this for pure Python libraries. However, as soon as you get into things that bind to C libraries (Numpy, GDAL, etc.) it quickly becomes much easier to use the package from your distro.
Everyone complained, no one addressed the fact that for years the whole of python packaging was handled by like 2.5 people.
But as a rant this, like most of the ‘but just fix it’ rants, fails to acknowledge the hugely diverging needs of different users. I could not live without conda, since it’s the only sane way to get a working recent geospatial stack. Others need to run embedded environments, or portable ones, some need long term stability while others need bleeding edge packages that haven’t been released yet. Solving for a single case is straightforward; solving for all of them , not so much.
> Everyone complained, no one addressed the fact that for years the whole of python packaging was handled by like 2.5 people.
If I can make an observation - it has nothing to do with the number of maintainers. The problem is deep, cultural and occupies a difficult space where it might be a bug or a feature.
The root cause here is that the Python project, and surrounding community, have little real respect for backwards compatibility. The complexity of Python setups is driven by the need to run multiple - potentially even mutually incompatible in the case of 2.x v. 3.x - versions of Python.
All languages have packaging problems, but Python is unique in my experience in the sheer number of Python installs that I need to manage simultaneously. I still have C code that works from around the time that I learned C. I'd need another Python environment installed to say the same thing about Python.
Python 2 was supported for 20 years. Python 2.7 (largely compatible with other Python 2 versions - and including backported changes from Python 3.1 - was supported for 12 years).
The Python project went absolutely above and beyond to support users who wanted to drag out making not particularly complex changes to their codebase for over a decade.
During this same time they improved Python 3 in response to feedback and among other effects, made it less different
and easier to port, from Python 2.
Many common packages supported 2.7 up to its end-of-life as well.
One of the reasons that Python is often a source of compatibility errors is that both distros and large standalone applications embraced Python in the early 2000s, became dependent on a particular version, and then refused to work with newer versions.
Python is not responsible for all the engineering decisions everyone writing in the language has ever made.
Yes. That is what a culture of not supporting backwards compatibility looks like. If they valued backwards compatibility they wouldn't have to support multiple versions of Python over long stretches of time. They would be supporting 1 version that was backwards compatible with code written over 20 years ago.
It really does have to do with the number of maintainers. A huge part of the work is not just building a package manager, but coordinating many stakeholders to all use it. That would also mean building a package manager that covers all of those use cases from day one - what's the point of trying to make people use something that doesn't meet their needs. These things are a huge amount of work. Unsurprisingly, other widely used languages have the same situation - dozens of build and packaging systems for C/C++, Java, Javascript...
And yet, other languages ship package managers [1] that are widely loved within their ecosystems and cover everything from microcontrollers to server applications.
I think what it makes it more difficult in the case of Python is that it has decades of legacy to deal with. No consistent semantic versioning, packages that expect that they can modify their package path in-place (this is a nightmare for immutable systems, like NixOS or OSTree-based system like Silverblue), a wide variety of build systems that sometimes hook into make, etc. Solving this is a hard problem.
This is why an authority like PSF has to step in and say: this is how it is going to be done from here onwards.
Rust has and “advantage” here that it’s not generally shipped with your distro’s package manager. I think my biggest problem with this article is that the distros put Python there in the first place, and all of them apply patches to make Python work how they want it. And when it doesn’t work, it’s Python’s fault…? I mean yeah Python can definitely do something to make distributing Python easier, but it can only do so much without distros’ direct involvement.
I’ll add that most of Linux distro packaging contributors are generally very nice people, understand the problem at hand, and are very open to collaboration. But sometimes you see this kind of “it’s all your fault” complaints and it’s doing exactly the opposite of helping the cause.
Cargo is a good example of a survivorship bias of sorts. It's so deeply integrated into rust that if you can't stomach it you just leave the ecosystem and everyone that remains likes it ;)
Is it fair to compare an interpreted language and its package manager to Rust and Cargo? Python packages ship their source (in most cases), depend on a locally installed interpreter (with semantics possibly changing by version).
Yes, Python packages often make poor assumptions about what setup.py can do (i.e., _anything_), and so you end up choosing between "tested, supported by the author, and old" or "untested, unsupported, but up to date".
Rust almost does the exact same thing, so I'd say it's fair. Dependencies (crates) are grabbed in source format and compiled locally as part of the build process, and installing Rust programs through Cargo also compiles them (and their dependencies) locally.
Some crates have the same issue where build scripts rely on outside tooling being installed, but it's definitely not common to (unless you're relying on compiling C/C++ code for FFI for example, in which case it's somewhat frequent).
I think there needs to be a distinction between fetching crates as source at _build time_ and what happens with Python. Python's "build time" will still require the source to be present on the target machine it's deployed to -- unfortunately these tools are complex because "packaging" is only half of the issue, it's also the distribution and _deployment_ which makes things messy.
Consider that, even if you want to use packages only from your application's virtualenv, the default (footgun warning!) is that Python will still use the "system" packages -- this means you may have installed Ansible or some other tool that relies on Python and many packages from the distro package manager. But your app could pick up one of those dependencies! At best, this will work fine. But in the worse cases, perhaps it subtly behaves differently or simply does not work at all.
My understanding is that Rust will, by default, statically link all of these dependencies. This, in Python, would be like a "pex" or "par" (or one of the many other options :^)), which does make the distribution aspect much simpler. (At the cost of build-time complexity, slowdowns, and occasional incompatibility.)
Python is just a more fractured community, for reasons that have never really been that clear to me.
But it doesn't harm the point I was getting at, which is that being a dynamic language doesn't give Python an excuse compared to Rust in itself when there are highly successful adjacent examples. Obviously there are other factors that come into play, or it would be ancient history by now.
Worth noting that the PSF has no authority to tell people how to do their packaging.
Rust's Cargo is 20 years newer than Python and benefits from those decades of experience. It's a very different proposition to start a new system from fresh than to try to migrate a huge and diverse community towards it.
This is actually where a BDFL should step in. "We're doing X for Python 5, we're using it to manage the stdlib as well as community packages, everyone get ready because I have decided."
They added the ensurepip module so users could add it after the fact.
Distros ripped it out too.
Arguably the advantage newer languages like Rust and Go have in this regard is they don't even consider the distro use case - you're going to get static linking and you'd better like it. Whereas Python is from an older era and tries to fit in with the local customs and so gets hammered for its inconsistency
Kinda diffrent Rust its like 3 years old when it start getting popular. Python is 13+ years old when it starts getting very popular.
You cannot compare legacy to modern solution -.-
Perl is 33 years old, and perl 5.x is 27 years old.
I can literally take a 20+yo book and all the examples still work. CPAN still works. I literally have 20+yo scripts still copied from server to server, from laptop to laptop, without any changes.
Maven was launched 9 years after Java got popular. Everybody was using Ant, everybody decided Ant sucked and moved to the new solution. While Gradle is the new kid on the block, it keeps the infrastructure of dependency management and is thus compatible with Maven to that extent. It can be done.
No, but ivy was commonly used. That said, the bigger point is that the standard way to do things can be changed. For some reason, despite (or because of?) PEP, the python community seems unable to coalesce around standard ways of packaging.
Perhaps I have lived too long out in the provences, but Maven was my first experience of dependency management in Java. After Ant it was a no-brainer because Ant didn't do dependencies. I didn't come across references to Ivy for some years and personally I have never seen it used in the wild.
That said, I didn't actually like early Maven that much, it was pretty inflexible and often required the AntRun plugin to do something novel. However, it is still top dog everywhere I work and very much the standard way to do things.
Which is exactly what my second paragraph said :). This is a common struggle for many older programming language ecosystems, e.g. I think the same is true to some extend for C and C++.
As one of the sibling commenters mentions, there are good examples that it is possible to standardize packaging better. Maven replaced IDE-driven builds and Ant for in the Java ecosystem and added proper package management. Additionally, it required that projects start conforming to standardized layouts, by taking convention over configuration and being largely declarative. I think the Maven success story lost some of its shine with Gradle, but that's another story.
I'm not saying it solves every problem for everyone, but have you looked at Nix? I've found it to be great for portability (at least on other Linux distros), and you can easily install anything from stable nixpkgs packages to Git repos. I'm currently using it to manage OS packages and development dependencies for some of the projects I work with (a couple at Work™).
This is my decision flow and I rarely have an issue:
Are you the end user of the Python code?
Yes -> Is available in your distro?
Yes->Use package manager
No->Use pip install in user mode
No -> Create virtualenv with the Python version you want (including pypy!) and do your pip thing there
Some extreme use cases may benefit from anaconda, but personally I've never needed to use it. My only pain point is dealing with legacy code that relies on PYTHONPATH. Nothing good ever starts setting PYTHONPATH.
Pipx is good for the case of "I want to run a standalone Python application that is available through Pip, but not my system's package repo." This is a more common case than you might think.
It's a sensible alternative to `pip install --user`, and having self-contained deps for tool is a bit like `npm install --global` or even `volta install`.
It doesn't address the greater issue tho: that it's getting harder and harder for distributions to package things right, and provider packages for their users (evidenced by the fact that you need a second package manager just for python stuff).
How is Pip any different than Cpanm, NPM, Gem, Luarocks, Nimble, Go's thing, Cargo, whatever JVM people use, whatever Haskell people use, etc. in that regard?
Distros have a hard job, but at the same time programming language tooling devs have more "customers" than just distro maintainers.
This is a great recipe for disaster. Whatever you install in user mode will shadow anything installed system-wide, so when you try to run some system-wide project, it may now fail. I'm also not a fan of how it drops scripts into `./.local/bin`, since that's where I keep my own script, and is version controlled.
The installation will also be frozen and never get updated -- unless you remember to do it manually.
Finally, and worst of all, this leaves you in a dead end if your packages have conflicting dependencies, which is too often the case in Python-land.
I used to just use pip to install to the system. Months/years later I would try to untangle the mess of packages I was just playing with, what the OS wanted/needed, I got those conflicting dependencies you mention, etc. I usually ended up reinstalling the OS. At the time I may not have been as knowledgeable about where the OS package manager keeps packages vs pip--but the whole thing wasn't very user-friendly either.
For years I've been installing into user knowing I can just blow it away. I've dabbled with virtualenv, but it's such a pain to set up and activate. If I have a few projects with similar libraries it's more of a pain to set them all up and switch around. If I end up using a script for something important, I just spend the extra time at that point to "package" it.
This is one of the reasons people use Anaconda/miniconda for non-data science work: conda environments are self-contained Python installs, so if you conda/pip install packages into those environments, they will not break each other. This design requirement arose from the specific needs of numerical computing (which always drags in a ton of system-level C/C++/FORTRAN dependencies), but is a generically useful design construct.
Anaconda is a distro, and conda is a package manager, that works across OS platforms and hardware architectures, and installs cleanly into userland without requiring admin privileges. The only way we achieve this difficult goal is by creating a distro and build system that creates "portable" packages that can be relocated/relinked at install-time.
Ultimately, Python's challenges in this department come from the fact that it has such great integration with low-level C/C++ libraries. This gives it super powers as duct tape/glue language, but it also drags it down into the packaging tech debt of C/C++. Hmm... maybe I should write that blog post: "Python Packaging Isn't The Problem; C/C++ Is." :-)
I was slow to get to grips with venv. It sounds like you are on the same path. This note tries to be constructive advice -
* Some distro software uses python. Let the package manager take care of dependencies for that.
* For everything else, use an dedicated virtualenv for each codebase you are working with.
> I used to just use pip to install to the system
Never do this, for the reasons you cite.
> I've dabbled with virtualenv, but it's such a pain to set up and activate
Setup for virtualenv: "python3 -B -m venv venv". Have a shell alias 'alias v=". venv/bin/activate"' that allows you to activate it if you need to install libraries or access a shell. "pip install blah" for library install. That should be all you need.
> If I have a few projects with similar libraries
> it's more of a pain to set them all up and switch around
Have a think about why you feel this way, and whether you could mitigate the problems.
Here is what I do. Once my libraries are installed for the current project, I rarely activate venv in the current shell. Rather, for each python project, I have a bash script "app" in the root of the project, and a dedicated "venv" directory.
The app script does the following: (1) sources the local venv; (2) does pip freeze > requirements.txt to capture any dependency changes; (3) launches the project. Often I will have multiple launchers in that script, with all of them commented out except for the active one. Be in a habit of always launching from that app script.
To reiterate the approach above, whenever you sit down to write some python code, ensure that you have a dedicated venv for it, and that you are only ever launching code from that local venv.
I have spoken to developers who get upset at the extra hard disk overhead. You don't need to optimise for hard disk usage. Hard disk space is almost free.
I don't bother creating setup.py files, except for the odd occasion that I want to publish code to pip. Good luck.
That's sounds like the general approach I take for "projects" even toy projects. My day jobs have never fit the virtualenv use-case. So at home I often have to look up how to use it. It's so rare that when I make an alias I even forget those.
Most new things are one-off scripts; move or rename some files, extract data from something, or pull from a resource. Something that requires libraries or is too big for a shell script. For example, the last one I see in my bin is a web scraper for appointments. It pulls a website, fills out a form, and gets the result a few times--about 70 lines. What's annoying is sourcing some environment just to run this one tool.
Most people have a directory of scripts (a mix of shell, Perl, or Python) they use if they spend a lot of time at a commandline. It's quite a pain to source the environment just to run a quick script. That's generally the libraries I install into user. I don't care much about the version and troubleshoot things as they come up.
It's completely global, shared by all Python interpreters of all versions.
I set PYTHONPATH, but the code in that directory is solely small debugging utils of mine that I want available in every Python interpreter, and I make sure not to put anything more complex in there.
That'll prevent it leaking to most things, but not to subprocesses of your application.
For example your application might interact with command-line tools written in Python, and unless you delete PYTHONPATH from the environment variables prior to launching any subprocesses, they'll inherit it. This could lead to subtle and confusing breakage.
The only justified situation I can find is when you are working on two (or more) independent components at the same time.
My pain point in particular with PYTHONPATH (or playing with sys.path) is that people tend to use it with the only purpose of making import lines shorter, which brings naming collisions of all sorts when you aren't creative enough.
And agreed, there are two separate use cases: development and using the software.
But, are distros creating too much work for themselves by trying to package every itty-bitty python library (and for that matter, every npm library)? Are distros doing anything more than scanning CVE databases with the library versions, or are they _actually_ auditing the versions they choose? (Not that there's much choice, since python also has a shitty story when it comes to backwards compatibility; if you're going with 3.10, there's possibly only one version of a given library that will work.)
Java has a commonly used "fat jar" approach which rolls up all dependencies into a single file. It's excellent. In the python world, this doesn't exist, because virtualenvs aren't portable. If that can be fixed (perhaps a specific section in requirements.txt that captures anything that needs to compile C for the platform) then a distributable virtualenv would become possible. Distros would then scan the application for vulnerabilities (via requirement.txt's manifest), build the distributable-virtualenv, and ship _that_. Python library maintainers don't have to do anything different (except, of course, use the standard way to declare dependencies).
> Are distros doing anything more than scanning CVE databases with the library versions, or are they _actually_ auditing the versions they choose?
Debian Developer here. Part of packaging work, for Python libraries or anything else, is to verify the reliability of the upstream developers, audit the code, set hardening flags, add sandboxing and so on.
I spotted and reported vulnerability myself and it's not uncommon.
Please don't take it the wrong way, I have a lot of respect for distros packagers and maintainers. I donate to debian, and I report bugs. I think you are basically heroes of the FOSS world, because without your invisible (and frankly thankless) work, mine wouldn't exist.
But come on, there are 300k entries on pypi, 200k more for perl and 160k for ruby. I'm not even counting the whooping 1.3M on npm because I assume this is considered taboo at this point.
You cannot package 0.0001% of that, not to mentions updates.
And unless distros make it as easy to package and distribute deb/rpm/etc than it is to use the native package manager, this means distro packaging will never be attractive to most users because:
- they don't have access to most packages
- the provided packages are obsolete
- packages authors have no way to easily provide an update to the users
- it's very hard to isolate projects with various versions of the same libs, or incompatible libs
And that's not even mentioning that:
- package authors may not have the right to use apt/dnf on their machine.
- libs may be just a bunch of files in a git repo, which pip/gem/npm support installing from
- this is not compatible with anaconda, which has a huge corporate presence
- this is not compatible with heroku/databrics/pythonanywhere, etc
- this is hard to make it work with jupyter kernels
Now let's say all those issues are solved. A miracle happens. Sorry, 47 miracles happen.
That would force the users to create a special case for each distros, then for mac, then for windows. I have a limited amount of time and energy, I'm not going to learn and code for 10 different packaging systems.
It's not that we want to screw over linux distros. It's that it's not practical, nor economically viable to use the provided infra. The world is not suddenly going to slow down, vulnerabilities will not stop creeping up, managers will not stop to ask you to use $last_weird_things. This ship has sailed. We won't stay stuff only published 5 years ago with delays of months for every updates.
Thanks for your work. And I have to say that for internal development, when there is no need for the latest features, it's much easier to develop based on a Debian release as much as possible. A stable distribution provides an easy to track baseline not only for Python libraries, but any other tools that may be needed.
I'm always confused by these sorts of posts because they happen often so there is clearly a problem but for some reason I've never had much of an issue. I've been using and developing with Python for about 15 years. In that time I've worked on Python projects large (OpenStack) and small (gabbi) and taken over maintenance for some old standbys (wsgi-intercept, paste to name two). Dealt with the 2->3 transition. Released a whole bunch of things to PyPI and relied on far more things that I've pip installed from there. It's been fine.
So there's a few types of projects you can write in Python:
1. Server applications that run in a dedicated environment.
2. Tools you write and run just on your machine (or some virtualenv, whatever).
3. Redistributable cli or desktop applications which end users will install and use.
For the first two types, you should never have any issues with Python and its dependency situation. You pin everything, and that's it.
For the third kind tho, it's complete pain. Different distros ship different Python versions, so you need to support all of them. You also have to consider that dependencies can't be an EXACT version, you have to support a range of them, and a variety of combinations.
And then, one dependency has a version that works in Python3.6, and another for 3.9. But they had an API change, so which one do you use? It'll break for half your users either way. Of maybe just put some `if version <= 3.6` all over the place, like we did during the py2->py3 transition?
#3 is the exact case where you also want to just pin everything. For an end-user desktop application, you ship a properly tested bundle, instead of trying to support all the different versions; as the end-user (unlike a developer using a lib on their own machine) should not ever have to interpret compatibility issues and should get a package that's been tested to work as a whole.
If a distro ships python 3.6 and the app wants to use 3.7, then the end result must include python 3.7 as well, either by distro being capable of having both versions at the same time or the app needs to ignore the distro-python and ship its own version in the package.
Distros, please stop screwing over Python packaging. It is incredible that Debian/Ubuntu pick Python apart and put modules into different packages. They even create a fake venv command that tells you to please install python-venv.
What they should just do is offer a bunch of packages like python3.7, python3.8 that install the official python package wholesale into /usr/python or someplace and then symlink one of them to `python`.
If I would get to redesign package management (both for Linux distros and for languages), I would have one package manager that installs everything into a global package cache, and then pick the correct version of libraries at run time (for Python: at import time). Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable. Instead, make it easy to get bleeding edge versions into the repositories. They can be installed side by side and only picked up by the things that actually use them.
The problem arises when non-Python packages depend on Python modules.
> If I would get to redesign package management (both for Linux distros and for languages), I would have one package manager that installs everything into a global package cache, and then pick the correct version of libraries at run time (for Python: at import time). Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable. Instead, make it easy to get bleeding edge versions into the repositories. They can be installed side by side and only picked up by the things that actually use them.
You may want to check out Guix and Nix - their approach is pretty close to what you're describing.
A common solution to this is if you still want to run traditional distros is to just run "bare infra" (whatever that means) on the host OS and everything else in containers or Nix.
> Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time.
I think this requirement made sense when disk space was scarce.
I think this requirement makes sense if you trust that your distro is always better at choosing the 'best' version of a dependency that some software should use than the software author.
Nowadays, I think neither is generally true. Disk is plentiful, distro packages are almost always far more out of date than the software's original source, and allowing authors to ship software with specific pinned dependency versions reduces bugs caused by dependency changes and makes providing support for software (and especially reproducing end-user's issues) significantly easier.
Isolating dependencies by application, with linking to avoid outright duplication of identical versions (a la pnpm's approach for JS: https://pnpm.io/) is the way to go I think. Honestly, it feels like the way it's already gone, and it's just that the distros are fighting it tooth & nail whilst that approach takes over regardless.
Ah JS, how many days has it been since the last weekly "compromised npm package infecting everything" problem? If you are upholding that as gold standard you have to be the worlds laziest black hat.
> Disk is plentiful,
I recently had to install a chrome snap because it is the new IE6 and everyone is all over chrome exclusive APIs as if they were the new ActiveX. Over a gigabyte of dependencies for one application and given the trend of browser based desktop applications? I would like to have space left for my data after installing the programs I need for work.
Distros assume responsibility for fixing major bugs and security vulnerabilities in the packages they ship. Old versions often contain bugs and vulnerabilities that new versions don't. Distros have two choices here: either ship the new version and remove the old version, or backport the fix to the old version.
Continuing to ship the old version without the fix is not an option -- even if you also ship the new version -- because some programs will inevitably use the old version and then the distro will be on the hook for any resulting hacks. Backporting every fix to every version that ever shipped is also not a realistic project.
Here in the startup world we often forget that there's a whole other market where many people would gladly accept 3-year-old versions in exchange for a guarantee of security fixes for 5-10 years. Someone needs to cater to this market, and the (non-rolling) distros perform that thankless task because individual developers won't.
> Distros assume responsibility for fixing major bugs and security vulnerabilities in the packages they ship.
I think they should just ship Python programs, not libraries. They could check what libraries given Python program uses are safe in the version that it uses them.
And just don't care if each of Python programs has a separate copy of the libraries or if particular version of particular library is shared between Python programs by Python environment.
Distributions might just give up responsibility for sharing Python packages between Python programs without giving up the responsibility for security of those programs.
Why not? Its cheap resource wise whereas dependency hell is potentially debilitating. For some reason many proponents of the package management status quo are blind to this. Having multiple versions of a dependency is only bad in so far as its "messy". It isn't objectively bad. But having a system that breaks applications because two or more cant agree on a package version is objectively bad. Its arguing aesthetics versus getting the job done. A poor position.
Windows, for all its faults, doesn't have this problem. It will happily accommodate multiple versions of say, .Net as needed.
Disk is cheap. RAM is cheap. Man-hours are not. Distros are maintained by people, who are often volunteers. You are asking them to do extra work (i.e. porting the same patch to multiple versions of the same library) so that someone else can have it easy. But why should they? Why not the other way around?
It's not just aesthetics. If a new vulnerability is found in, say, libjpeg, then every Windows app that uses libjpeg needs to be updated separately. Tough luck if your favorite image manipulation tool takes a while to catch up. On the Linux side of the fence, the distro patches libjpeg and every app is automatically fixed. This is a huge win in terms of protecting the user. Why should we give up that benefit just because some developer wants to use his own special snowflake version of a library?
Not managing that would require less work not more. Their position is making more work for themselves. The point was to prevent dependency hell through matching the wrong package versions. Something that can occasionally happen in Windows. The problem is, the current form of management causes the far worse form of dependency hell of applications requiring conflicting versions.
I have maybe had Windows get confused about dependency versions twice ever and both times it was a driver inf for a virtual device. I will grant that fixing the problem required a fair amount of work by Windows standards but frankly not all that much by the standards of some of the more hands on distros.
I have had Linux tell me I can't install an application because it wanted a different version of Lib-whatever than what something else wants many many times.
"Why should we give up that benefit just because some developer wants to use his own special snowflake version of a library?"
Odd that you claim major distros are built by a small group of volunteers but the maintainers of much smaller and less well supported applications need to suck it up and use whatever version the Distro maintainers decide on.
Most major distros are not volunteer run and haven't been for ages. Ubuntu, RHEL, SUSE, POP, the list goes on. These are commercial products with full time paid developers. In the case of Ubuntu they are providing a major chunk of the work back upstream to Debian and in the case of RHEL they are the upstream. Most minor distros are downstream benefactors of the big players.
Contrary to that its still common for many FOSS apps and utilities to be one man jobs. Maybe the guy doesn't have the resources to keep up with the break neck pace of some update cycles. What if they decided to go with an LTS build intentionally? What if its a simple package that doesn't have security issues yet gets updated for other reasons? What if the version they are using has core functionality that was EOL'd in a newer release so they can't move on without major rework that they can't manage?
There are a million reasons for why a project may want to stick with an older version. Also allowing for the ability to update all packages does not require draconian control of which packages can be installed. This notion runs against the whole notion of user control. If the user wants multiple concurrent versions on their system who are you to say they can't? FOSS means freedom.
You should patch two dozen python programs as a whole that use vulnerable version of libraries. Treat Python programs as if each of them was a single executable file and you only have information which versions of libraries it has inside it. And if any of those is known to be vulnerable treat whole program as a security threat.
What makes you think so? SSDs aren't exactly stellar in the cost-per-TB department, as will be the case with each new higher-performance storage technology. Plenty of people cannot afford the prices of new Western tech either, what about them?
> SSDs aren't exactly stellar in the cost-per-TB department
First of all, 1TB for binaries and libraries may as well be infinite. Secondly, you can get a 1TB SSD for under $100, which is pretty damned inexpensive when you consider it took until 2009 to get HDDs that affordable.
It's plentiful relative to the size of compiled or source code. E.g the biggest .so file on this system right now is a <150MB libxul.so. That's only used by one piece of software anyway, and the drop-off is pretty steep after that. A 64GB drive (tiny these days) can fit more than four hundred of that unusually large file.
Not if they pull in all of their dependencies, PyQt would have a complete copy of all Qt binaries and a complete chrome install because of course Qt includes a browser based html viewer. Python packages are gigabytes.
What distro is pulling PyQt as a dependency of Python? There is a difference between "dependency" and "every package which has the word python in the description".
PyQt only contains the bindings. You share the same Qt environment across your system (hence qmake needs to be in your path). The python package itself is not that big (~10 MB).
The comment on top of this chain is about letting every package specify its own versioned dependencies. So how would that global version work out when python needs 5.1 and some other software specifies 5.2?
That guarantee only applies to Qt itself, I would expect that the newer Qt binary was also compiled against all the other newest versions of its own dependencies. Good luck finding a backwards compatibility promise for all of them.
> Unless you are trying to save a buck it seems 1TB is the standard today.
I suppose part of the problem is that while getting a 1 TB SSD instead of a 512 GB or even 256 GB one may not be overly expensive (for a middle-class person in a wealthy country anyway), due to the way OEM laptop product lines are often stratified, you may need to either buy your 1 TB SSD separately or get an altogether higher-specced model than your perhaps otherwise would. The latter especially isn't cheap.
There might be some customization options but sometimes little customization is available. That's probably one of the ways people end up with relatively small-capacity SSDs.
It's kind of similar as with RAM: a higher capacity isn't that much more expensive in theory, but in practice it may be.
This is irrelevant for custom-build desktops but lots of people are running only laptops nowadays. I'd like to see better customizability for the builds, as well as upgradability and replaceability, but the options are often limited.
> Unless you are trying to save a buck it seems 1TB is the standard today.
Buying a 1TB external SSD would more than double the cost of a raspberry pi 4 that and my ancient beagle board does fine running from 32 GB.
> My primary desktop has 4.
Those are rookie numbers for a primary system. Of course my Office system next to it is a lot lower specked with the test system next to it even lower.
Embedded systems really shouldn't be brought into play here but even then a 256 GB uSD card for the Pi is $25 dollars and itself far overkill. My entire primary desktop OS, firmware, DEs, and very extensive package set fits in 15 GB. Multiplying my primary system by 10 and sticking it on a Pi taking $25 of storage with plenty to spare is still not an argument against binary sizes, especially since there are niche distro spins used for that niche space anyways.
The eMMC on a Beaglebone Black is 4GB. Sure you can boot off an SD card but that's less robust (though, I guess you can use the SD card for all your virtualenvs...).
> I think this requirement made sense when disk space was scarce.
No, the main reason is security. I need a distro to guarantee me that the library that I use are going to stay the same for the next 3 to 5 years, while also receiving small targeted patches.
> I think this requirement makes sense if you trust that your distro is always better at choosing the 'best' version of a dependency that some software should use than the software author.
No, it just has to be better at choosing version than picking them randomly using pip.
Furthermore, when thousands of developers use the same combination of libraries from a distro the stack receives a ton of testing.
The same cannot be said when each developer pick a random set of versions.
One case where disk space is somewhat scarce is on shared academic computing clusters (which often provide many versions of things via some module system, but your $HOME can have a quota that's just 30GB).
Homebrew does this very well with its "cellar" system. Every version of every package gets installed to its own root tree, eg `/usr/local/Cellar/python/3.9.7/`. The currently-active version is then symlinked into `/usr/local/opt/python` and from there into `/usr/local`.
I'll remember that "Homebrew does this very well" the next time I have to fix a bunch of shit because it has updated the currently-active version, or removed this or that bugfix release, as part of a general upgrade. After the third time this happened, I started using pyenv - which is another mountain of brokenness, I grant you, but at least I have some degree of control on what happens when.
pyenv is pretty good for working around this problem. I've recently switched to asdf-vm which I like even more, since it handles versions for multiple languages and tools.
I have been meaning to try out ASDF-VM. Currently my shell initialization script has at least 4 of these "version managers". While I don't really mind them (and they are mostly well-behaved), it might be nice to have something a bit more centralized.
In addition to python version issues, I was also running into JVM and gradle version compatibility issues, which I was handling with jenv and some aliases that would swap JAVAHOME environment variable as needed. asdf-vm cleaned all that up in a very clean way, and I like the way you can set .tool-versions file for a project and share with other asdf-vm users.
hands down the best way to manage python and its packages.
Agreed, especially on Windows.
It just works.
This is pushing it. It's not hard to break conda or put yourself in situation where the updater/dependency checker gets stuck and doesn't know what to do, especially once you start adding conda-forge packages. But it does do a better job than anything else I've tried (although poetry + pyenv on Linux is getting much better)
FWIW, we are soon going to be releasing a much faster depedency resolver. We are also thinking hard about how best to address the "growing ecosystem" problem, in a future-proof way.
IIRC GoboLinux was the first distro to do things this way. Sadly, it didn't catch on and the Linux world doubled down on labor intensive volunteer package maintenance.
Great shout out. I still ought to try using it one of these days! It seems like a good option for people who want a better file system hierarchy without the extra complexity of Nix/Guix.
Homebrew does this kinda poorly compared to Nix and Guix. They are a different breed.
For starters, there is no /usr/local symlinking process. It's also possible to have multiple versions of e.g. python installed and active. Homebrew is like a poor-man's Nix.
I hate that Homebrew uses /usr/local. At least on M1 they had to move it to /opt but I always install it to my home directory in ~/.brew. I can override the paths and not have to worry about file/directory permissions.
This is the real legacy problem: Python comes from a world, where only one version of one packaged seemed the right way to do.
I do not have a good idea, but other ecosystems evolved much more sane in the realm of packaging. While not ideal, Go has done a fairly good job - and the "module" operations are instant - which they should be.
Ok but Go compiles statically, while you can do the same with pyinstaller, I don't think that's really comparable as we're talking about deployment right there.
Static binaries are a different story. Go has dependencies as any other modern language and they had a bad story in the past and have a better story today.
Sketch for python: Create a ~/.cache/python/packages directory. Manage all dependencies there. Make the python interpreter "package aware" so that required dependencies are read off a file from the current project (e.g. "py.mod") and adjust "system path" accordingly and transparently. Or something along those lines.
No extra tool, a single location, an easy to explain workflow (add a py.mod file, add deps there with versions, etc).
I'm just thinking out loud, but it does not need to be hard.
Despite the downvotes, the argument stands: linux distributions are having a hard time handle the amount of tiny libraries and the conflicts in versioning and many maintainers voiced their concerns in the past years.
The point echoed in this discussion multiple times is that distributions should not handle the tiny python libraries and attempt to solve the dependency version issues, but treat an application with all its dependencies included as a single package. If a dependency needs to be bumped a version for e.g. security purposes, then the app obviously wasn't tested with the new version (which didn't exist at the time) and needs to be retested, repackaged and rereleased for the update. This would cut down on the number of packages to be maintained, as the vast majority of python libraries would be exclude from the direct packaging process.
> If a dependency needs to be bumped a version for e.g. security purposes, then the app obviously wasn't tested with the new version (which didn't exist at the time) and needs to be retested
The burden of updating multiple copies of the same library across many packages grows exponentially and is simply untenable for distributions.
If you can find an army of volunteers to do that, distributions would love their contributions.
This hasn't happened in the last 20 years. I'd love to be proven wrong.
Since simply updating the dependency can easily break the resulting package, this need to re-test is not something that can be avoided by making some other choice of packaging e.g. the current one - it's not adding a new burden, it's acknowledging that it already exists (indeed, IMHO much of what the original article complains about). If there are no resources to carry that burden, then the only option seems to be to wait for an updated release from the upstream, whenever that arrives.
I wrote "The burden of updating". Testing still needs to be done but there's a lot of automation to minimize the workload.
> If there are no resources to carry that burden, then the only option seems to be to wait for an updated release from the upstream, whenever that arrives.
No, most upstreams do not backport security fixes. And switching to a newer release is not an option if you want to provide stability to users.
That sounds similar to what I do in macOS. I hate installing homebrew to /usr/local so I started installing it to ~/.brew and I hate using the python from homebrew so I always use pyenv.
This behavior is why things like Snaps and Flatpacks have become so popular. Package managers operate under a draconian and outdated mindset that gets in the way more than it helps at this stage.
You can both allow different versions of the same packages to coexist while also managing updates and installation/removal of software. It doesn't have to be this way. Software should be able to ship with its dependencies included and work and not rely on the whims of the OS getting it right.
> Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time.
I’m not exactly sure how it works but I think I’ve heard that newer releases of Enterprise Linux (EL8+) support multiple channels of the same package or something similar.
Interesting idea, we should be able to hook before `sys.modules` cache or make on of such cache for each module in Python then we should be able to produce this.
However, I thought the point was helping distro package management, which, to my knowledge, is not really built to support multiple installed versions of a package at the time: `dnf upgrade` for example, will upgrade all single instances of each packages to their newer release.
Actually you can already override __import__ and implement this, but then you still need an installer, and distro support for multiple instances of a same package.
Actually, explaining how to get Python working on Windows is far, far easier than on either linux (modulo various distros) or the Mac. That's because there is one obvious distribution of Python to use, the official one, and new versions of it are always consistent and play well together.
Yes you can use Anaconda if you want, and people who do that are probably data scientists or something and know what they want to do and why. It's well documented and has it's own robust ecosystem.
I say this as someone who's been on Macs at home since 2007 and works professionally on Linux, but I started with Python on Windows back in 2002.
Unfortunatly windows has plenty of problems too. First, the system PATH will kick you if of you have more than one python installed, so the official installer does not add to it by default, hence the python command doesn't work after an install. Instead, you get the py launcher, but it's not provided if you installed python from the app store or with anaconda.
". . . and then he installed cygwin, and decided to manage and run python through the bash environment. . . " (fun fact: the git client's bash shell is actually cygwin. Also; MobaXTerm has cygwin bundled-in as well).
Sorry, but to be frank, I think you're a bit ignorant here. Let me explain, starting from the bottom:
> I would have one package manager that installs everything into a global package cache, […]
There is exactly one package manager. If you're on Debian or Ubuntu, it's dpkg. If you're on RedHat, it's rpm. If you're on Arch, it's pacman. Yes, some of the BSDs have two (base packages + ports tree), but they're the odd ones out.
pip, cargo, go*, etc. are not the same thing. I know they're called that but they don't perform the same function: none of them can create a working system installation. Let's call them module managers to have a distinct label.
> and then pick the correct version of libraries at run time (for Python: at import time).
That's easy for Python, and incredibly hard for a whole lot of other things. A module manager can do that. A package manager needs to work for a variety of code and ecosystems. Could it try to do it where possible? Maybe. But then the behavior is not uniform and made harder for users to understand. Could it still be worth it? Sure. But not obviously so.
I would also say that this is just giving up on trying to keep a reasonable ecosystem. It's not impossible to reduce the dependency hell that some things have devolved into. It just needs interest in doing so, and discipline while effecting it. I'd really prefer not giving up on this.
> Get rid of the requirement that there is only one stable (minor) version of a package in the distribution at one time. This has become unworkable.
This is to some degree why distros are breaking apart Python. Some bits are easy to install in parallel, some aren't. There can only be one "python". Worse, there can only be one "libpython3.9.so.1.0".
> Distros, please stop screwing over Python packaging. It is incredible that Debian/Ubuntu pick Python apart and put modules into different packages. They even create a fake venv command that tells you to please install python-venv.
They're trying to achieve the very goals you're describing. Trying to give you a working python without having to download and "install" some weird thing somewhere else. And at the same time trying to keep the module managers working when they're replacing some module but not all of them.
On a subjective level, it's obvious you have a strong distaste for this ("they even create a fake") — but could you please make objective arguments how and why this breaks things? If you're getting an incomplete Python installation, that seems like a packaging bug the distro needs to fix. Is that it? Or are there other issues?
> If I would get to redesign package management (both for Linux distros and for languages),
And, I'm sorry to say this, but your post does not convey to me the existence of any essential C codebase packaging knowledge on your end. I don't know about other ecosystems, but I have done packaging work on C codebases (with Python bindings no less), and you don't seem to be aware of very basic issues like header/library mismatches and runtime portability.
If you are interested in this topic, please familiarize yourself with the world of existing package managers, the problems they run into, and how they solve them. There's a lot to learn there, and they're quite distinct from each other on some fronts too. Some problems are still unsolved even.
I am not one to install python packages using my distro's package manager, but I totally agree with the sentiment that we need a more standard build/dependency management system in python. I like poetry, and I think most people are heading that way, but it doesn't seem to play super nice with pyenv (which is a critical tool) a lot of the time, and I think that a first party endorsement of the "one true build system" a la golang or rust would be a huge step.
I landed with poetry as well. The issue with python dependency managment in my book is, that it is incredibly (and needlessly) hard to learn how to do it properly. It took me years to figure out how things should work and there are still issues now and then that cost me a few hours to figure out and restore things.
Meanwhile when I use Rust all these things are taken care of in cargo. It is part of the language. There is one right way to do things and the way is supported by comfortable tooling, that works so well that you literally don't even consider thinking about anything else.
The way python does dependencies is totally unpythonic. The fact that it is 2021 and this isn't fixed or at least the number one priority of things that need fixing sheds a dark shadow on the whole language – a language that I like to use.
Poetry is good. But it isn't as good as cargo, because it also has to deal with all the legacy cruft. To run code developed with poetry on a non-poetry system you have to figure out all the ways of dealing with envs, paths and such.
Issues like these get me a little fired up, because the collective brainpower wasted on something that should have been elegantly solved in one place is gigantic.
To run code developed with Poetry, you shouldn’t even know it’s developed with Poetry. It should be released as a source distribution or a wheel on PyPI or as a conda package if it contains nontrivial binary extensions. These distributions should be then either packaged by the OS or they should be installed with all their dependencies into a separate virtual environment in /opt.
Sure, if I release the code I tend to do just that. However a lot of the code I have been writing was to be used on servers where installing poetry was not an option. That means there was no straightforward, well documented or otherwise easy way to copy the project onto the server an "just have it work". And this was not due to me not taking care of my dev environment.
Now I know how to do this, so this is not a problem. My complaint was mostly, that this was a waste of time.
If installing Poetry wasn’t an option, deploying a frozen set of dependencies should be still relatively easy with plain ”pip freeze” and “pip install -r”.
I’m not a Python dev. I do need to occasionally run things written in Python. I made the mistake of trying to get pip + Conda + pyenv (or whatever) to install a fairly simple tool. I have no idea how the dev got their setup working, but it was totally and utterly unreproducable, even after they sat down on my computer for several hours. In that amount of time, I could have probably rewritten it in PHP (that’s actually exactly what I ended up doing while they attempted to get a Docker container running).
Needless to say, I will only use the distro package manager these days. I know the versions are (probably) compatible, maintainers will usually backport security patches, etc. You get none of that using whatever flavor of the week python package manager.
I think this actually gets to the crux of the matter. The existing python dependency management tools (especially the new shiny ones like poetry) are very much designed by python devs, for python devs. What you're describing is a totally different use case, which is running released software in a sensible way.
The distro package managers are probably the best place for that, but bridging the gap between them and the python ecosystem is an obvious challenge.
It's not really the fault of Python that somebody fucked their environment beyond belief. I've always used pip and almost never had a problem. A "fairly simple" application should not require the tools you mentioned.
That, my friend, is called a filter bubble. And your comment is not productive btw. I for one never "fucked my environment", I only ever used distro packages and I still couldn't make a simple script even start without it spitting cryptic error messages about modules or paths.
I'm not a Python dev; I just needed to run a tool that didn't have any other alternatives.
Meanwhile, I can download scripts written in a range of other languages and just fire 1-2 commands and the thing will work.
Same BTW. And I started having a Docker image for each Python script that I need to run (I'm not a Python dev). Took me a while but I've mostly tamed the beast. And at least I can run the occasional script that has no alternatives.
I had to learn this by FUBAR'ing my system a long time ago, but my setup process for working with a python package from PyPI (i.e. not installable by or updated enough in OS's package manager) nowadays is:
- sudo apt install pip3
- pip3 install --user --upgrade pipenv
(In workdir):
- pipenv install --three package
- pipenv run package --option
Works like a charm and doesn't mess with my system.
This is especially frustrating to read because one of the main selling points of Conda is reproducibility. In data science teams, I've found it indispensable for making sure people can all run each others' project code.
So for anyone reading this in the future: don't try to use Pyenv to install Conda. Pyenv tries to set up shims for every binary in the Conda env, which will likely break your PATH.
Pyenv supports installing Conda because Anaconda used to be "just" a Python distribution.
They can otherwise coexist without trouble on the same system.
My understanding of the problem is that Pyenv attempts to detect the contents of "/bin" relative to the top level of every Python installation that it manages.
It does this so that it can set up its shim to handle any executable that gets installed in any Pyenv-managed environment.
This is how Pyenv creates the "foobar is not available in your current environment, but is available in x.y.z" message. It's also a much more reliable solution than trying to explicitly whitelist every possible script that might get installed.
The problem is that this was only designed to work for Python executables and scripts installed by Pip. Conda environments can contain a lot more than that; it's not hard to end up with an entire C compiler toolchain in there (possibly even both GCC and LLVM) or even Coreutils.
If Pyenv detects `bin/gcc` in a Conda env, it will set up a system-wide shim for GCC, which no longer passes the `gcc` command along to the OS, but intercepts it, only to inform you that no such command exists in the current env!
So it's not that Pyenv hoses Conda envs. It's that Pyenv can hose PATH if you have it manage a Conda installation, and if that Conda installation ends up with non-Python stuff in `/bin`.
Obviously I don't know what exactly was broken when you tried to set up that application. But this particular adverse interaction bit me at work a few years ago, and ever since then I have insisted that Pyenv should never manage a Conda installation.
I think that's a reasonable policy anyway, in light of the facts that:
1) Conda isn't really a "Python distribution" anymore.
2) The Pyenv installer just runs the opaque Conda installer script and there's basically no way to control the version that gets installed.
3) They are different tools that serve different purposes and it doesn't make sense to have one manage the other anyway.
4) You probably shouldn't use the Python that's installed in the base Conda environment anyway. You need that to run Conda itself, and you want to keep the list of requirements small to make sure that updates can progress cleanly. It's basically the same as any Linux package manager like APT. Except of course, those tools don't generally support "environments" other than chroot.
Both Conda and pyenv are for managing (their own) Python installations to run different Python versions in parallel, it doesn't really compute to try and use both at the same time.
Yes. I tried to use Gitpod (a cloud development env platform) which ships with pyenv. Poetry would very stubbornly only use the system python and completely ignore whatever the pyenv shim was pointing to.
Golang is a great model to follow. I think the fundamental thing that python (and ruby, when I still used it) has taught me is to run a mile if a language is without a robust dependency management system. The pain is just not worth it, even for a nice language with an otherwise robust ecosystem.
With golang, it's not just the dependency management side, gofmt (and various bits of other tooling) is also an incredible blessing. Yes, prescriptivism can sometimes feel weird to us computer nerds, but sometimes it's just much better to have a canonical way of doing common tasks. It means onboarding people is much easier, and infrastructure/code/whatever is more reusable.
Go is terrible for distributions to package due to the poor library versioning. Also, static linking makes security updates really expensive to build at scale.
> Every one of these package managers is designed for a reckless world in which programmers chuck packages wholesale into ~/.pip, set up virtualenvs and pin their dependencies to 10 versions and 6 vulnerabilities ago, and ship their computers directly into production in Docker containers which aim to do the minimum amount necessary to make their user’s private data as insecure as possible.
Is this any different from any other programming language ecosystem? Is Python really doing worse than Node, Ruby, Perl, Lua, Go, or Haskell in this regard?
Python files go in `/usr/lib/pythonx.y`, Python finds said files, programs run.
Yes, Python build and packaging in general is messy. But I am curious why, specifically from a distro perspective, it's any worse than anything else.
- python is more active than most alternatives, you have new packages created every day.
- python is massively used outside of the web, unlike JS, ruby or PHP, that are 99% web. You get Python in SIG softwares, data analytics, automation, pen testing, sysadmin, biology, etc. It's a huge graph.
- python is used by the distro themselves to code features of the OS. E.G: you remove Python, there is no yum.
- python has a rich compiled extensions ecosystem, produced from c, c++, fortran, and assembly. It's very complicated to ship them.
- it's much more common to have several Python installed than for other dynamic languages. So isolation matters even more.
So the difference is the sheer size of the problem.
> Aside from the variety of languages in the extension ecosystem
That's a big one to omit. But alright, I'll play.
It has much less packages, it's almost never used on windows, it has stopped being popular 10 years ago, it doesn't have anaconda because it's not a "corporate tech", you rarely install several versions of perl on the same system, you don't make enough projects with perl to justify one isolation per project, nobody moved to perl 6 so the all CPAN transition never had to happened, perl is not used to script DB/GIS systems/3D engines/IDE, perl is not used by millions of non coders (geographers, mathematicians, physicists, bankers, etc) that have no idea how their machine work.
But to be honest, the nail in the cofin is that distros decided not to split perl and cpan in separate packages. In fact, in somes distros, perl and cpan are already installed and ready to be used.
So for linux:
- python: you must decide among several python, then intall the right package to use pip and venv, isolate your install with venv. The procedure is different for windows. Also 2.7 is a thing.
- perl: you have one perl that hasn't change for years, no new packages, it's already installed, cpan is installed as well, and it's not gonna break your system if you use it. You don't care about windows. Perl 5 forever.
The distros devs live in their own world, they don't have to work on a fast api service connected to a sap db that must run on windows with no admin rights for dev but must be deployed on centos 7.
The notable language ecosystem which is different is "C on Unix", which grew up with "the system provides the dependencies, it's often easier to avoid a dependency if you can, the program usually needs to be able to cope with whatever version of the dependency the system has, rather than pinning to a specific one". That's the primary ecosystem that most Linux distro package managers developed with as their platonic-ideal-shape-of-an-application, I think.
Once upon a time, I did "apt upgrade python-pip3" or something like this (was it just "python-pip"? Or maybe it was "apt upgrade"? It was a couple years ago). Anyway, what I do remember is that it quite literally killed apt: invoking it with any command would lead to a dump of a stack trace with ImportError coming from pip. Apparently, apt uses system-wide pip internally so if you touch it, everything breaks? Don't know, don't much care: since it was just a VM so I simply rolled to the previous snapshot and forgot about the details.
Edit: Ah, apparently the steps to reproduce are: do "apt install python-pip3"; do "apt install python3.8"; when pip3 complains that it's outdated, update it with the command it itself suggests.
When deploying the developed application on some server, all the exact dependencies get installed there. The main reason for the existence of the server and its configuration is to run the application, so the server adapts to the needs of the application and gets the dependency versions preferred by the app, instead of the application trying to adapt to the server and trying to make do with the libraries already existing there.
I'd say it's a heavy-handed approach to mitigate more fundamental issues with how python packages are maintained, if everybody wants to pin different versions then we're going to have to install different versions of everything which is what npm does and I consider that heavier.
Again, it's all a question of point of view, what we see as a package manager problem and causes us to keep reinventing packages managers, might actually be a problem with how we maintain our packages, my point of view being the latter. But I'm digressing.
When it comes to installing on "another machine", you don't know what Python they have, you don't know what libc they have, and so on, that is exactly what containers attempt to mitigate, so that seems exactly like the tool to use for this problem.
I think it's a fundamental problem with managing dependencies. On one hand, any given application usually knows what version of dependencies it actually supports so it makes sense for the application to simply bundle those in: it most extreme cases it's statical linking/binary embedding, or (usually) putting the dependencies in subdirectories of the application's directory ― in cases where the application has a "directory where it lives in" instead of it being thinly spread all over the system (e.g. over /bin, /etc, /usr/bin/, /usr/lib/, etc.).
On the other hand, the users/sysadmins sometimes want to force the application to use different version of dependency, so the application may provide for that by somehow referencing the dependency from the ambient environment: usually it's done by either looking for a dependency in a well-known/hard-coded path, or getting that path from a well-known/hard-coded env var, or from a config file (which also you have to get from somewhere) or from some other injection/locator mechanism, thousands of those.
And all this stuff is bloody fractal: we have system-level packaging, then Python's own packaging on top of that, and then some particular Python application may decide to have its plugin-distribution system of sort (I've seen that), and that too goes on top of all of that, not to mention all the stuff happening in parallel (Qt sublibraries, GNOME modules, npm ecosystem)... well, you get the picture. It kinda reminds me of "Turing tarpit" and I doubt collapsing all this into one single layer of system-level packaging, and nothing on top, is really practical or even possible.
My usual approach is to let the distro install the packages it wants and then, for each thing I'm working on (I sometimes write Python code for a living) can have a separate environment with different module versions. This makes sense when my artifacts are deployed via Docker images that are generated with `pip install`, but not as much if you plan to install them in your machine. The only thing I build like this for public consumption is pip-chill, which is not intended to be used with the bare Python environment of the distro (I probably should add something that makes it refuse to install that way).
Even when the things are not Python-based, I often use a virtual environment, managed with virtualenvwrapper (installed on the distro level). One example is a Terraform config that relies on some Python tools to manage deployments - the Python tools are local to that virtual environment and not usable anywhere else.
If I need to develop something that'll need to run with the distro directly (something that could be distributed as ditro packages) I'd use a virtual environment and tailor the package versions in requirements to the ones available in the distro. This way, multiple distros can be addressed with multiple environments pointing to the same source directory and multiple requirements files for tests.
> What is it about Linux distros that makes our use-case unimportant? Have we offered no value to Python over the past 30 years?
Indeed you haven't. Worse, you've actively damaged Python's efforts to improve. I mostly work on the JVM these days, and I think one of the main reasons dependency management there is so gloriously simple and effective is that the Debian packagers weren't around to fuck it up.
There are now, but for a long time they had to live in contrib because Java wasn't open-source, and so apt didn't deeply integrate and customize their package management the way it did with perl or python.
This is a very weird post to read, as someone who does (occasional) distro Python work - I feel like Python absolutely is listening to us! It's just that it's hard work, and distro packaging is mostly done by volunteers, and there are a whole lot of things to work through.
Here's some work I and others did earlier this year, which I thought was a great example of folks from the core Python packaging world and folks from distros working together: https://www.python.org/dev/peps/pep-0668/ See the massive table of use cases for all the things we had to think about.
You might note that one of the things it needs is additional participation on the Discourse thread from the authors (like myself) and from other distros. Again. It's mostly work done by volunteers, and there's a lot to work through. There's no magic to it.
I can tell you that the following (from TFA) will absolutely make things worse for distros, though:
> I call on the PSF to sit down for some serious, sober engineering work to fix this problem. Draw up a list of the use-cases you need to support, pick the most promising initiative, and put in the hours to make it work properly, today and tomorrow. Design something you can stick with and make stable for the next 30 years. If you have to break some hearts, fine. [...] These PEPs are designed to tolerate the proliferation of build systems, which is exactly what needs to stop. Python ought to stop trying to avoid hurting anyone’s feelings and pick one.
If you want the PSF to fund some engineering work on its own that it can finish much faster than any volunteer packager can even read the proposal, break some hearts, stop proliferating build systems, and hurt people's feelings, they will absolutely do that and say "Distro packaging is not supported, we only support virtualenvs. Users should make virtualenvs. Distro software should ship virtualenvs. Installing a Python package systemwide is meaningless." That's clearly the best-supported option right now, and it's a surprisingly technically defensible answer, but it's not going to make you happy.
"use virtualenvs and install your dependencies from an unfiltered, unsupervised, untrusted source" is certainly the solution most aligned with the rest of the industry but I struggle to see any technical benefit from it. Other than perhaps "move faster and break more things".
Virtualenvs don't require using unfiltered/unsupervised/untrusted sources. They're a place to install things into, not a place to get things from.
The specific model that distros could adopt, if we went this route, is that each Python package builds into a .whl, they're build-dependencies of applications, and applications install .whls into a virtualenv at build time. You'd still restrict packages to come from the distro with the usual policies (built from source code, compliant with licensing, not contacting the network at build time, etc. etc.).
So, for instance, installing "python-somelib" would get you a /usr/share/python-wheels/somelib-1.0.whl, built from source. The build process of "someapp" would create a /usr/share/someapp/venv, pip install that wheel into that virtualenv, and then symlink /usr/bin/someapp to /usr/share/someapp/venv/bin/someapp.
The PEP goes into more details about why virtualenvs are recommended, and I can give you a whole host of subtle reasons, but that's not really the point - the point is that it's defensible, not that it's perfect, and that it's very easy to implement with what works today. So if you ask the PSF to come up with something that magically solves the problems and makes people sad if necessary, this is literally what they're going to come up with. If you don't like that outcome (and there are good reasons not to like it!), then you shouldn't ask for them to arbitrarily pick an outcome some people won't like.
The problem here is that distros only supply one version of the package which means you gain nothing by having that same version installed in multiple virtualenvs.
So I guess the problem actually lies in the python library ecosystem that's becoming a npm like dependency hell.
The relevant question here is probably whether the library ecosystem is like that because the packaging tooling sucks or is it the other way around?
- Almost all distros don't just supply one version of a package. They try to avoid it, but I can't recall one with a hard rule against it. For instance, Debian packages multiple versions of autoconf https://packages.debian.org/search?keywords=autoconf2 , the Linux kernel, etc.
- One reason that distros try not to install multiple versions of a package is that it's hard to specify which one you want. If you have, say, requests 1.0 and 2.0 installed, which one does "import requests" get you? The virtualenv approach, where "import requests" simply does not work in an un-virtualenv'd Python, avoids this problem entirely. Each application can independently build-depend on python-requests-1 or python-requests-2, and each user can create their own virtualenv and pip install /usr/share/wheels/requests-1.whl or requests-2.whl as they prefer.
- Even if you do not have different versions of the same library, it may well be the case that you have two different libraries with the same importable name. As a great example, see CJ Wright's talk from PackagingCon last week "Will the Real Slugify Please Stand Up" https://pretalx.com/packagingcon-2021/talk/P3983F/ (I expect they'll post videos online soon). tl;dr there are three packages you can get via "import slugify", and they expose different APIs.
- The possibility you haven't accounted for is "The library ecosystem is like that because things are fast-moving, because people have actual problems they want to solve, and upgrading dependencies and sorting out conflicts is work." It would be great for that work to be done, but we go back to the problem I mentioned at the top - limited volunteer time. In the absence of time to engineer things perfectly, your options are to ship something that's engineered imperfectly or decide not to ship it. We went through the dark ages of "We'll ship the next Debian release when all the bugs are solved" over a decade ago, and it turns out that this doesn't actually help users in any way.
> The possibility you haven't accounted for is "The library ecosystem is like that because things are fast-moving, because people have actual problems they want to solve, and upgrading dependencies and sorting out conflicts is work."
Sorry, I was overly aggressive in my previous comments. All I want is the Python ecosystem to acknowledge some people prefer stable over fast moving, recognize the importance of that and work towards a solution that isn't horrible for their use case.
We all depend on someone somewhere deep in our stack caring for stability, even when we don't realize it.
They have acknowledged it, noted the use case, and decided that they are unable to support it. Python has this in common with most other programming languages.
Providing long-term backwards compatibility with the ability to share libraries and update them system-wise is difficult and imposes very high costs on an ecosystem. Only a small number of tech stacks have ever done this (for example, C and Perl) - and not only are they so old that it wasn’t a conscious choice, they have stagnated as a result.
I have a lot of (possibly controversial) issues about this post, for various reasons:
- Debian/Ubuntu Python packaging is actually pretty comprehensive, and Ubuntu LTS ships with reasonably updated (although admittedly not latest revision) third-party packages that are usable out of the box, and that I could just list in Ansible to have usable environments spun up.
- Python packaging is hardly a mess. I have been using pip for ages without any issues other than forcing wheel downloads for unusual distros (like Alpine, where musl makes it chancy to use some low-level stuff). But if you're on a mainstream distro with modern pip, wheels just work.
- If you're not on Linux (or not on the mainstream), pyenv also just works. I have been using it across several years of macOS releases without any significant issues other than knowing to pass it the required build flags to build out the Cocoa bindings here and then (which is easy to do with the brew pyenv).
And, finally, I'm constantly shocked at the number of people who just don't get virtualenvs, or who don't know how to switch python interpreters by using environment variables.
I've never looked back, and they were instrumental in bridging the gap between 2.7 and 3.5 while I converted some code across (I never really found that the switch to Python 3 was as dramatic as many people made it, perhaps because the code I handled worked after a single pass with 2to3 and minor tweaks).
This is my experience with using any python thing on OSX:
- brew install: fails probably
- pip or pip3 or whatever: fails probably, if it succeeds, breaks something else
- look for program-specific install programs/instructions (like the aws cli for example): maybe works, probably bombs
Python's reputation (and sales pitch) among developers is a language for people that don't want to program or learn software development. It's basically the new BASIC.
Their packaging and installers only reinforce this reputation.
Not that it is sweet roses in javaland or many other language ecosystems. Dependency graphs are complicated, because they are graphs when people want them to be simple trees.
As for virtualenvs, why does a user of software that happens to be python need to know that?
I have the opposite experience. But you are conflating brew with pip, and they are largely independent even if you use pyenv from brew (which you should).
Again, I don’t see a lot of logic or understanding of the toolchain in that comment.
Distro maintainers say that language ecosystem packaging makes it hard for them.
Language ecosystem packaging maintainers say that distro package managers make it hard for them.
The year is 2021 and there is no work towards synthesis. It will be endless, fruitless yelling from each side.
Maybe 2022 will bring change. I'm not holding my breath.
To anyone who finds themselves on a single "side" in this argument: if you have ever said "why do you need to do that" as an accusation instead of with curiosity, you exemplify the problem.
I've had success restricting myself to Python packages provided by the OS (Debian) in production. The only down-side is there is only one Python (well, 3.* or 2.7) against which to test, so I drop back to a pip install of the same packages for CI, and that's where 90% of my Python pain comes from, every maintainer of every Python infrastructure package feels entitled to shout "DEPRECATION, do THIS, do THAT, use THIS" at me. It's ill-mannered and unpleasant.
The tools in use are completely irrelevant and only add noise to the discussions around packaging. So what actually is the problem here? Are projects delivering broken outputs (ie. bad packages)? Are they not delivering outputs at all? Those are real problems but pointing at the number of tools that exist is not helping.
Python itself confuses sources and outputs in that not every package has consumable source tarballs— for some packages the only way to get static metadata on dependencies is by starting a build as if to get a binary output.
Native dependencies are not really handled except in an ad-hoc way, either.
If you want to import Python packages into a real packaging system, you are confronted with the tools whether you want to be or not.
> pin their dependencies to 10 versions and 6 vulnerabilities ago
That is the real problem IMHO. Most python users like to pin dependencies so that "their program don't break", and that's also the reason why so much effort is put in what they call "correct" dependency resolution resulting in the creation of new python package managers that all do "more correct" dependency resolution and make programs "break less" and at the same time require less efforts from the maintainers to actually maintain their code by doing dependency upgrades. And then one day you want to upgrade a dependency and you realize you're 10 releases behind on 10 dependencies, and what was supposed to be a quick maintenance task is now 100 maintenance task.
If you don't pin, your program will break one day or another, a user might open an issue with the traceback, or, your program will break directly in CI where you will see the traceback. Upgrade it, or contribute to the dependency, but just go ahead and fix it, instead of being defensive and trying to have dependency resolution that "doesn't break". At the same time you'll be adopting the actual practice of "Continuous Integration", of your dependencies, which has a better cost/benefit ratio.
I always avoid pinning dependencies, I try to make pip just install the latest version of everything, I am willing to contribute upgrade fixes to any Python package I use, but some dependencies do pin which breaks my own aggressive Continuous Integration practice, heck, I'd even need an option for pip to ignore version resolution at all so that I can make all my contributions to upgrade everything I use. I even remember when I had CI test matrix with all combinations of versions of everything, I don't do that anymore, I just support the latest of everything, we can always have the latest python with containers anyway so that's not even a blocker anymore. If you're not a "techbro using containers", it'd be fine too because you should then be able to make your distro packages at any point in time and expect all of them to work together, minus the delta of the handful of upgrades that are pending to release here and there.
This is an very quick route to a maintenance nightmare imo.
If you have totally unpinned dependencies, and you come back to a project after a year untouched, or 5 years, and it no longer works - which dependency update broke it?
I don't agree that using an outdated package is necessarily a problem at all. Some versions are done! You don't need the latest version of every possible package. You don't necessarily need to update _ever_ (which is why this differs from CI). These updates are often entirely unnecessary churn.
There absolutely are vulnerabilities in some old versions, and those updates are necessary (but tooling & notifications to easily handle this have dramatically improved in recent years, especially on GitHub). There will also be vulnerabilities in new packages though, which may be unknown, and will often not exist in older much simpler versions.
Using a well-tested version of a dependency that does exactly what you need is not less secure than chasing the latest version at all times without a specific reason.
I've found manually updating packages on the rare occasions where relevant vulnerabilities arise, and using existing working versions without changes the rest of the time has been perfectly effective over many years now, and avoid the shifting sands of external dependencies wherever possible means that a project that worked 5 years ago still works _exactly_ the same today.
I rather see more software go in this direction, valuing reproducibility & known correctness (i.e. with isolated pinned dependencies, in some form) over 'always be latest' dependency updates and the complex & hard to reproduce bugs that those shifting dependency interactions can create.
> If you have totally unpinned dependencies, and you come back to a project after a year untouched, or 5 years, and it no longer works - which dependency update broke it?
In this case, it doesn't matter to me which dependency update broke it, what matters to me is to have the tests passing again with all dependencies.
> I don't agree that using an outdated package is necessarily a problem at all. Some versions are done!
If a version is done then why is there a new release? Using old versions is tech debt that will one day blow up and cost much more to correct than if it had been corrected over the time, not to mention the security risk.
> You don't necessarily need to update _ever_
Another dependency might decide to use that dependency in a newer version, in which case aren't we all better off using the latest versions of everything? The cost is some effort, the benefit is more features, security, performance, less bugs, basically a better program.
> There will also be vulnerabilities in new packages though, which may be unknown, and will often not exist in older much simpler versions.
Then why upgrade at all when you have a dependency with a security issue? After all, in your upgrade you might be adding even more unknown security issues that might be even more dangerous.
> Using a well-tested version of a dependency that does exactly what you need is not less secure than chasing the latest version at all times without a specific reason.
If I can test well that a newer version of a dependency works for me, why not upgrade it? There might be performance, security or other bugfixes, and I'm allowing other maintainers of other dependencies to also use that newer version.
> I rather see more software go in this direction, valuing reproducibility & known correctness (i.e. with isolated pinned dependencies, in some form) over 'always be latest' dependency updates and the complex & hard to reproduce bugs that those shifting dependency interactions can create.
Ok, but then again, some version might fix a security bug that has not been backported to your old version, especially if it's 5 years old.
Basically you're advocating against continuous integration (I'm talking about the "practice", not talking about "the tool that runs automated tests that people call CI").
> In this case, it doesn't matter to me which dependency update broke it, what matters to me is to have the tests passing again with all dependencies.
Knowing which change (or changes) broke it can make resolving the issue much faster.
> If a version is done then why is there a new release? Using old versions is tech debt that will one day blow up and cost much more to correct than if it had been corrected over the time
Because <new feature> was added, that is totally irrelevant to your use case.
> not to mention the security risk.
As parent mentioned, there is automated tooling for this. Your tooling yells that you are using a version of a package with a vulnerability, so you update.
> Another dependency might decide to use that dependency in a newer version, in which case aren't we all better off using the latest versions of everything?
If you don't update A, it doesn't matter if a newer version of A wants a newer version of B.
> The cost is some effort, the benefit is more features, security, performance, less bugs, basically a better program.
This assumes bugs and vulnerabilities decrease monotonically over time. This isn't true.
> Then why upgrade at all when you have a dependency with a security issue? After all, in your upgrade you might be adding even more unknown security issues that might be even more dangerous.
Because a known (to the world) security flaw is orders of magnitude more dangerous than an unknown (to the world) one, all else being equal. If there is a CVE for it, there are likely large-scale attempts at exploiting it anywhere it can be found.
> If I can test well that a newer version of a dependency works for me, why not upgrade it? There might be performance, security or other bugfixes, and I'm allowing other maintainers of other dependencies to also use that newer version.
Large amounts of labor. Furthermore, "well-tested" may include "battle tested". Some bugs make it through to deployment, and get caught and fixed. Updating dependencies without a good reason means more potential bugs slipping through, which means more bugs being discovered in deployment and a worse experience for the end user.
The day packaging for distros will be easy, we will use it. Right now, making a deb is hard. Isolating 2 projects with different deb verions is hard. Distributing debs is hard. Upgrading your OS but not your python debs is hards.
Then rinse and repeat for red hat, arch, nix, mac and windows ?
Since Nix packages are distribution-independent, once you have it packaged with Nix, you could theoretically skip packaging for Ubuntu etc., but of course that may raise the bar of entry for your users.
I doesn't work on windows, so you lose half of your users. And WSL is not the answer to that.
Also, unless you repackage thousands of compiled c extensions and you play well with anaconda and can plug into the entire python ecosystem of platforms such as heroku, python anywhere, databrick and so on, you then lose 90%
As a "self taught and still learning developer-lite", I love the python language, but the ecosystem drives me nuts. I feel a lot of the pain expressed in the article, and it pretty much speaks to my current conclusion of "I'm trying to do things the 'right way' but there doesn't seem to be a 'right way'".
I've seen a few comments here about how Nix/NixOS fixes the whole python binary/library mess, but I'm having trouble understanding how. Does anyone have any insight to share about that?
Additionally, the whole thing kind of makes me want to move away from python wholesale. I was wondering if there are other languages that are great general languages like python that don't suffer from this whole packaging and versioning mess. Ruby? Go? Something else? I'm looking for something high level, somewhat easy to learn, and with good library support for things like working with databases and tabular data. Though I don't know much about them, I just feel like I don't want something like Java or C++ or anything like that. I want to "get things done" and not have to worry about tons of boilerplate or working at really low nitty gritty levels.
> I've seen a few comments here about how Nix/NixOS fixes the whole python binary/library mess
Nix is a bit of a cult. Its theoretical aims are laudable, but in order to get there it forces you to do a lot of work and reason strictly in its own way. Whether all this work is worth the rewards, I think is open for debate.
Chemistry is a bit of a cult. Its theoretical aims are laudable, but in order to get there it forces you to do a lot of work and reason strictly in its own way. Whether all this work is worth the rewards, I think is open for debate.
It's easy to learn coming from Python, even more so if you can start with Scala 3 right away. You cannot really get more high-level than that. There are good libraries to work with databases, Quill comes to mind if you want something user-friendly. And you can always fall back to the many Java libraries in the big data ecosystem (even though you should probably avoid it if you can).
Interesting take. I use Python daily and virtualenv is really good for our use cases (running CI/CD, testing installation and running in production in a container). I am not experiencing too many problems with pip + venv. We also maintain our own libraries and even those were relatively easy to set up. Mypy and yapf also helps to maintain a style and type correctness (do not pass in None accidentally).
I do all of my server side Python development in a container. Whenever I start a new project, I create a dockerfile as a first step. This is the easiest way I have found to make my code portable. The main downside I have found is with IDE support. I use the latest tag, so I get updated containers periodically, which has not been a source of breakages. I have not figured out how to tell VSCode to use the Python in my container for statement completion and validation.
For personal projects on my desktop, I have given up and just use pip install, and then don't try to share those scripts.
Can someone make the case for distributing python packages at the distro level to me? Especially scientific software for data analysis?
Our project gets a few issues opened by distro maintainers who have trouble with some part of their build process. Is it really worth our project maintainers time to help troubleshoot esoteric build processes when we already provide source, wheel, and conda distributions?
Users generally don't want Python packages; they want software. They don't care what it's written in, and shouldn't have to install it differently depending on what you chose.
“Just learn $language_package_manager_of_the_day, and hope it doesn't break anything when your system changes” is the equivalent of the 90s “./configure ; make ; make install”, and we should have moved past that for end-users by now. Most users are not developers, and installation procedures need to cater to both.
> If you’re distributing an application, shouldn’t you just ship an environment with our package bundled?
No-no-no-no! If you (application author & distributor) do so, you are obligated to release new version of your application (which includes 3rd party libraries as "environment") each time your dependencies are updated due to security reasons. Do you know many application maintainers who want to do so?
If your application depends on system-installed (distro-provided) libraries (python modules, native libraries, whatever) it is responsibility of distro maintainers do not have versions with known security problems in system-wide repo. It is much better for you (app author & mainitainer) and me (distro user & admin), IMHO.
> shouldn’t you just ship an environment with our package bundled?
That's how I would prefer it, but, if the intended form of distribution is a distro package, then it should pin itself to the versions the distro provides and avoid (or vendor in) packages that aren't available.
It is possible to just place everything in the app directory and distribute it this way, but it's kind of ugly (and doesn't pick up security updates from the distro)
> Can someone make the case for distributing python packages at the distro level to me? Especially scientific software for data analysis?
As a matter of principle, I prefer to avoid software from non-curated repositories like pip and the like. Installing the debian-provided scipy and numpy is more than enough for me.
I don't understand what's so special about python that needs its own "package manager" when the distro-provided one is already good. If I need something, I install it using "apt install", regardless of what language it is written in.
What I suggested was for the package owner (I build a font that's also distributed as part of Debian) to do whatever they needed to make their lives easier. It was a long time ago, but, I remember they made a couple changes and improvements to the Makefile that streamlined their side of the operation. They maintain a fork of my repo.
I don't know what your project is specifically, but it's reasonable that a system-level user-facing application might want to depend on Numpy or Scipy.
The OP mixes together different tools used for different causes.
* setuptools/pip/poetry is package managers
* egg/wheel/? are package formats
* venv/conda are environment to isolate from the OS
another degree of complexity is added by the OS: their package managers and ability to install as root vs under user. But in any layer, there's not much to get confused.
Over 12 years with Python the only serious change I saw was setuptools (was it named like that?) => pip in 2011. The only serious issue I saw was when I installed with several managers (OS, setup.py, pip) and/or as root/user. That got solved in 15-30 minutes after checking the imported package __path__, and cleaning things up.
Teaching people at courses, I saw people who struggled the most never checked anything: doesn't work? They'd try installing harder, or just re-ran things. But they never checked where the packages were installed, and where from they were imported.
Otherwise, I came to following simple rules:
1) python & ipython are installed via an OS package
2) packages that have to be executed like jupyter may be installed as root, or user; just never install both.
3) all other packages are installed as user. Computers are personal, and I almost never execute things as root, nor have other users.
Maybe being able to examine sources of errors was why I didn't feel it was so hard? I'm not a sysadmin, nor a hardcore programmer. It's just I got used to check paths and read error messages attentively.
This does not mean it's easy, I wish things were simpler, there were less ways to do things, but the OP exaggerates things as if there's a thousand of tools.
The PSF won't fix it, its members are the problem. Dozens of people profit from useless churn that results in billable hours, and they actively purge anyone who opposes.
Blaming Python vice Linux distros isn't important. The implicit agreement that a system Python will be set up in a certain way, with a certain version etc is fragile. Programs and processes included with a Linux distro expecting a certain Python set up is a problem - it's making assumptions about system state. Python doesn't include tools for managing versions, dependencies, and standalone programs. Linux distros use programs depend on it despite this.
At the core of this is implicit cooperation between system programs, third party programs, and users. This concept underlies a common frustration with Linux: It works reliably out-of-the-box, but customization and installing programs introduces problems.
I wrote a Python dependency and version / installation Manager in Rust to help deal with this sort of thing, as well as related issues like dep conflicts between Python projects.
The flexibility the author decries is one of the strengths of the ecosystem. Oh, things don't work for datascientists on Windows? Here, use conda. Bunch of different webapps on same server? Use venv. And so on and so forth, every niche will use what works best for them.
This is one reason Go has become so popular. If Python weren't such as mess, people could look past its poor performance, but combine that with the installation and dependency issues and it's just too much.
Virtual environments are a good way to have different packages for different applications running on different versions of Python. My usual rule is you don't mess with the system-provided Python environments for specific applications you are working with. I would even suggest dropping support for many packages at the distro level unless they are required by other non-python packages (the same way Django and Twisted are requirements for MaaS).
> don't mess with the system-provided Python environments for specific applications you are working with
Right, but then just use pip install --user instead of a virtualenv. Actually --user is the default now when running pip install from non-root user, so, just pip install as a user will work and not mess with the system packages, just don't do sudo pip install.
> Right, but then just use pip install --user instead of a virtualenv
This seems almost as bad as system python. I suppose it's fine if you only work on one thing, but as soon as you don't, your dev environment will become chaotic and lots of confusing WFM will happen. E.g. this is why npm has a separate node_modules folder for each project.
> why npm has a separate node_modules folder for each project
This sacrifies time and disk space and hides tech debt under the carpet, I'd rather have a solution like `pip upgrade` that would upgrade all packages and fix the environment, like `pacman -Syu`, but people would have to stop pinning versions and actually maintain their codebases and the dependencies they uses.
> but people would have to stop pinning versions and actually maintain their codebases and the dependencies they uses
Virtual environments is the mechanism by which you do that in a non-silly way. How do you think people cope with a dependency that has a bug introduced in the most recent version?
I've run into problems when running pip install --user. Someone installs a package in the user environment, and that works. But later you'll install a package in a shared virtualenv (because a Python program needs to run as a service or by another users or whatever) and pip doesn't install it because it's already on the user environment. However, other users don't have access to that environment and they will have an import error that you won't see from your user.
Not to mention that installing every dependency in the same environment is a recipe for both disaster with version conflicts and bloat when you don't really know which packages belong to which applications.
Interesting problem but I think it's more a problem in
virtualenv, which don't use global site package by default, I'm surprised that it uses user site packages by default.
However, installing every dependency in the same environment is also what distro package managers do, I don't see that as a recipe for disaster, it all depends how well the said packages are maintained in which case just upgrading them should fix whatever problem.
> However, installing every dependency in the same environment is also what distro package managers do, I don't see that as a recipe for disaster,
Distro packages are done in a way where it's hard to get conflicts. I haven't found an instance where python3-X and python3-Y can't be installed because one requires python3-Z v1.2 and the other python3-Z v2.1. With Python packages, that happens way more often.
Also, distro package managers keep track of everything installed, while python ones (at least pip I'm 100% sure) don't. You could install a package which upgrades another one, and that breaks another that you had installed before and pip won't say a word.
> I haven't found an instance where python3-X and python3-Y can't be installed because one requires python3-Z v1.2 and the other python3-Z v2.1. With Python packages, that happens way more often.
Exactly, because the packages were created/updated at a point in time were they were compatible: a distro ships python 3.x with packages of python modules that are compatible with python 3.x.
Now if you as a user want to use a python package that works with python 3.y, you either have to wait, or install python 3.y, or use a container of python 3.y.
Again, if all python package were up to date as in "working together at this point in time" then `pip upgrade` would work.
> Also, distro package managers keep track of everything installed, while python ones (at least pip I'm 100% sure) don't.
Are you sure you checked in the `*.dist-info` directories? There should be one per package, containing: a METADATA file, with all dependency versions, a RECORD file, with every installed file and their hash, and much more! see for yourself ;)
> You could install a package which upgrades another one, and that breaks another that you had installed before and pip won't say a word.
Let's agree to disagree right there:
- in your point of view: this is a problem in pip
- in my point of view: this is a maintenance problem in the python packages themselves, caused by the general practice of pinning
> Again, if all python package were up to date as in "working together at this point in time" then `pip upgrade` would work.
That is impossible, it will never happen. Packages will have different maintenance cycles, some will get deprecated, others abandoned... You can't base your upgrade policy on an impossible situation.
> Are you sure you checked in the `*.dist-info` directories? There should be one per package, containing: a METADATA file, with all dependency versions, a RECORD file, with every installed file and their hash, and much more! see for yourself ;)
I know, but pip doesn't track all of those all the time. An example that has happened to me multiple times: install package X that depends on Y <= 1.0. Good, pip installs the proper version. Now, on another command, install package Z that depends on Y >= 2.0. Pip will install Y >= 2.0 and won't care that package X is now broken.
> - in my point of view: this is a maintenance problem in the python packages themselves, caused by the general practice of pinning
Regardless on your views on pinning, it's a problem in pip. A version conflict should be reported as an installation failure, not allow you to continue.
And again, version restrictions will always be there. The "live at master" philosophy only works for small groups of similar output capacity. It won't work for an ecosystem as wide as Python. Even without version pinning, you'll still have packages breaking because another one was updated. Sometimes it will be necessary, such as for example a package dropping support for a feature that another one needs.
> That is impossible, it will never happen. Packages will have different maintenance cycles, some will get deprecated, others abandoned... You can't base your upgrade policy on an impossible situation.
You realize that if that was "impossible" and "never happening", then absolutely no Python environment would be working ever?
> A version conflict should be reported as an installation failure, not allow you to continue.
Please don't make this mandatory.
> Pip will install Y >= 2.0 and won't care that package X is now broken.
Fine, I'll just quickly fix X and open a pull request, like I probably did a hundred times, then eventually if necessary deploy my fork with the fix meanwhile they do their maintenance release.
Should we not take the responsibility of the dependencies we use and contribute back??
> The "live at master" philosophy only works for small groups of similar output capacity. It won't work for an ecosystem as wide as Python.
I don't understand why, but I'm talking about "live at latest release", not "at master".
> Sometimes it will be necessary, such as for example a package dropping support for a feature that another one needs.
Then the package can just paste the code of the feature in their own, until a new lib does it, I remember having to do that twice in 20 years (except I didn't just "paste" it, but implemented a much smaller version).
Overall, it seems my approach produces versions of my software and software that I depend on that are compatible with all versions because you can always use an earlier version if you really want to deploy on an old python or whatnot, whereas you approach leads to broken packages, tech debt, and blaming the package manager.
> You realize that if that was "impossible" and "never happening", then absolutely no Python environment would be working ever?
No, it means that you can't be fully sure that upgrading everything to the latest version is not going to break anything. And right now, you can't. You can only do that in controlled environments (say, a distro's official package repositories), not with Python packages.
> Please don't make this mandatory.
It is mandatory in quite a lot of package managers. APT, for example, will refuse to install packages with conflicting versions or breaking things. It's better to fail early with a clear message than to have a later failure where the cause is unclear.
> Fine, I'll just quickly fix X and open a pull request, like I probably did a hundred times, then eventually if necessary deploy my fork with the fix meanwhile they do their maintenance release.
That's optimistic. What about detecting which package is causing the issue? What if the fix is not quick? What if the package developers are working on compatibility but it's going to take time?
> I don't understand why, but I'm talking about "live at latest release", not "at master".
Problem is similar. You can't live at latest release with software coming from wildly different developers, with wildly different policies on compatibility, versioning, breaking changes, bugs...
> Then the package can just paste the code of the feature in their own, until a new lib does it, I remember having to do that twice in 20 years (except I didn't just "paste" it, but implemented a much smaller version).
Again, pretty optimistic. You won't be able to do that with all packages. For example, right now I have a project that needs to work on Python 3.6 (among other packages). Latest numpy versions dropped support for Python 3.6, so the project needs to restrict numpy versions to maintain compatibility. I can't just take the latest numpy and patch it for Python 3.6.
> Overall, it seems my approach produces versions of my software and software that I depend on that are compatible with all versions because you can always use an earlier version if you really want to deploy on an old python or whatnot, whereas you approach leads to broken packages, tech debt, and blaming the package manager.
You also spend time doing maintenance and debugging on new installations to debug and fix compatibility issues, when those upgrades might not bring any value to the product. In my case, I do that debugging and fixing in a controlled environment, only when I decide to upgrade the packages, and maybe revert/pin the ones that don't have quick fixes. That's the difference. Once a given package is released/packaged for distribution, I want dependencies to be fixed so that every new installation works, and doesn't fail if some developer decided to break things that day.
> you can't be fully sure that upgrading everything to the latest version is not going to break anything.
Well, it's still what you do every time you create a new project.
> It is mandatory in quite a lot of package managers.
Ok but many packages won't be installable at all anymore, because people pin different versions instead of maintaining their packages by continuously integrating upstream releases.
> You can't live at latest release with software coming from wildly different developers, with wildly different policies on compatibility, versioning, breaking changes, bugs...
Well I didn't know can't so I did.
> I can't just take the latest numpy and patch it for Python 3.6.
Instead, you should be upgrading your code to support the newer numpy version. Seriously, try it out, you'll be spending less effort at the end of the year.
> You also spend time doing maintenance and debugging on new installations to debug and fix compatibility issues, when those upgrades might not bring any value to the product.
I spend less time and more spread over the year, bringing value to the whole ecosystem of packages which are bringing value to my product, and bringing value to myself as I develop new products with the dependencies I like and support as such.
> In my case, I do that debugging and fixing in a controlled environment
I wait for CI or users to report that a new release isn't compatible so I can fix it as soon as possible, in which case we temporarily pin versions, instead of piling up tech debt until it's so huge the whole project becomes trash.
That's what I call taking responsibility with the dependencies that you include in your product.
> Well, it's still what you do every time you create a new project.
Yep, and it is a pain in the ass when the package you want to use creates a dependency conflict. Luckily, at that stage I can still search for an alternative with little to no cost.
> Ok but many packages won't be installable at all anymore, because people pin different versions instead of maintaining their packages by continuously integrating upstream releases.
And some packages won't be installable anymore because they use deprecated APIs, or were only compatible with Python 2, or expected certain files/folders/programs to be present on the system and they aren't anymore. Packages go into abandonware all the time, removing version pinning doesn't fix that.
> Instead, you should be upgrading your code to support the newer numpy version. Seriously, try it out, you'll be spending less effort at the end of the year.
Thank you for telling me this, I will be relaying to my clients that they need to upgrade their systems to newer Python versions to get new features, surely that will go well.
> I wait for CI or users to report that a new release isn't compatible so I can fix it as soon as possible, in which case we temporarily pin versions, instead of piling up tech debt until it's so huge the whole project becomes trash.
Well, I prefer to not have my program crashing in client environments because someone decided to update a package and break things.
> piling up tech debt until it's so huge the whole project becomes trash.
Waiting to release on new dependency versions until those versions are tested is not tech debt. I mean, semantic versioning was done with this purpose precisely: to signal which upgrades are just bug fixes and you should upgrade ASAP, which ones are new features so you can upgrade without too many problems, and which upgrades break compatibility and you should test it thoroughly in case something broke.
If living at latest release works for you and dealing with dependency issues on new installations is not a problem, then go. But other systems will require that a certain release installs dependencies on a consistent manner. Some of my dependencies are automatically upgraded and tested before release, others are pinned because they always break things and I only upgrade manually. But once a version of a package is released, that one goes with version pinning because I prefer to avoid the risk of a client installing the software and having to explain them than "oh, it's just that this dependency released a new version that broke things and we didn't specify the version that our software needed".
> Packages go into abandonware all the time, removing version pinning doesn't fix that.
Abandoned dependencies should be treated like dropped features: re-implement them in your own code or create another library.
> Thank you for telling me this, I will be relaying to my clients that they need to upgrade their systems to newer Python versions to get new features, surely that will go well.
Very funny, but seriously they should have an upgrade path, or install their machine once and then never do a single upgrade, but again I'd say this is just tech debt piling.
> Waiting to release on new dependency versions until those versions are tested is not tech debt.
Not having your own tests is tech debt thought. I wouldn't rely on tests made by others.
> I mean, semantic versioning was done with this purpose precisely: to signal which upgrades are just bug fixes and you should upgrade ASAP
Of course I agree, the problem is that maintainers tend to not upgrade as often as they should, like until they really have too (another dependency upgrades a shared dependency). Again I see this as a maintainance problem rather than a package manager problem.
> If living at latest release works for you and dealing with dependency issues on new installations is not a problem, then go.
I'm not saying I never had a problem of course I do, I'm saying that having these problems has a better cost/benefit ratio at the end of the year, because: 0. they are smaller, dealing with one BC break at the time, and 1. they are spread over time because I integrate upstream releases continuously instead of waiting for the day I want to upgrade everything.
> "oh, it's just that this dependency released a new version that broke things and we didn't specify the version that our software needed"
I'd rather say "oh, it's just that this dependency released a new version that we are implementing support for as we speak, here's the command you can run to fix it meanwhile: ...".
But if you don't want to have that problem, just run your CI periodically, make sure the first thing you do in the morning is checking that nightly did actually pass tests, as you're talking about paid software maintenance, I'm talking about both paid and volunteer so that's why I'm including "wait for a user to report" but that doesn't apply for paid maintenance of course.
For me we're seeing the discussion between two fundamentally opposite approaches, one being defensive and the other being offensive / actual continuous integration. From my experience, the offensive strategy offers a better cost/benefit ratio at the end of the year.
Apt and desktop environments need to break their Python dependencies then, or stop doing daft things like colliding different 3.x run levels on the same runtime path.
Go ahead and install pip3 from Python 3.6. use that to install pip3 for 3.8, and try using apt. Backup first.
Your last sentence doesn't parse for me at all. How exactly do you get a pip installed on Python 3.6 to modify anything about another install of Python, let alone a different version?
Use pip3 in 3.6 to install the latest pip3 that requires Python 3.8, and drops that package in the dist-packages folder, which is shared between system Python runtime versions.
Behold as further attempts to do something sane with APT blow up, because now you have 3.8 packages on 3.6's runtime path.
I've lost weeks to this particular brand of distro daftness. I thought that surely no one would do that
.. Alas...
Ah, so that what happened in my anecdotal story in another comment in this thread [0]. But why would you need a newer python3 anyway? The package maintainers ship 3.6 by default for a reason and they know better than you! /s
In my case? Recreating a build environment in the process of debugging an issue with a build script and trying to get the runtime environment lined up just so; with a side objective of an exploration of the madness of how Python as a language handles packaging (part of which was what kind of footguns are presented by the lang specific package manager, which funnily enough also seems to be completely oblivious to the runtime version it's executing in).
>The package maintainers ship 3.6 by default for a reason and they know better than you! /s
Maybe then! Not anymore! It really does infuriate me because it breaks every aspect of the principle of least surprise and conventional software packaging practice.
> the dist-packages folder, which is shared between system Python runtime versions.
I mean... I understand how hypothetically this could in theory make some sense because a distro might naively expect to ship the same package version for different Python versions and so all files could be the same except for native extensions, which are disambiguated by implementation name, so theoretically one could imagine a position where this could be conceived of as a thing one could do to save some space in case someone installs the same package for multiple Python versions.
But in reality? Jesus fucking christ. However, I don't see how this can be blamed on Python. This isn't something that "just happens because Python" and is certainly far, far away from any kind of default Python configuration, where all default paths are below the prefix, which contains the Python version. This is something that can only happen because someone went extraordinarily far out of their way to shoot someones else's foot off.
While I agree with this statement, it's also kind of the distro's fault for making it so easy to mess with system Python. If I was ever to design a distro and needed a "system Python" I would make sure that it would be completely isolated from and invisible to users. It would only be upgraded as part of an overall major OS version upgrade.
Given, and yes I've heard that before, but I counter with:
If you don't want someone to mess with it, name it as something other people will not regularly have a reason to muck with. One concerted effort to ship distro's with a "sys-python" symlink instead of /usr/bin/python, and for all things currently using the newly minted sys-python stuff can be free of wayward travelers and system sculptors like me mucking with things.
No, pyenv is not the answer. Naming things is.
And trust me. I get it. It's a rite of passage learned in due time, yada-yada. Got it. If we ever want to get this stuff easier to use though, we have to be willing to help draw clean distinctions between this and that. Virtual environments don't do that until it is far too late and a beginner is already in and over their head in breakage.
As far as I can tell, they're a strict requirement, if you ever work on more than one project and want to isolate the dependencies (because you need different versions of the same package, because you want to ensure you've tracked all deps, whatever).
Is it that you don't isolate dependencies between your python projects at all, or is there some other solution you prefer?
Python is basically unusable without venvs in my experience. Not sure what your alternative suggestion is. I hope you don't think global pip install is the right way.
In an enterprise world, you can't rely on dependencies being present on the target machine, so install a .venv/ directory in your application distribution containing all the libraries (populated by pip) and a bin/venv-python wrapper script to set PYTHONPATH correctly and to call the .venv/bin/python. Then create a bin/run-app script to call bin/venv-python, with the location of your entry point module from src/python/
In summary: bin/run-app -> bin/venv-python -> .venv/bin/python -- I'm not sure how to make it simpler than this. The cost is perhaps in disk space, but I don't care about that.
Or don't do the above and deploy via docker and use pip to install to the system python in the image.
Python's trend of "reinvent it yourself" engineering leads inexorably to a sprawling mess of duplicated and incompatible projects. The source of this problem is Python's lack of "lead by example" on the topic of DRY code / implementations.
With Perl modules (like Python packages), the intent is to never have to rewrite them, but to make them so generic that they are just extended by new modules (that inherit the old modules) to gain more features. Their naming convention reinforces this intention, with "Net.pm", "Net/IMAP.pm", "Net/IMAP/SSL.pm", etc. Each one of those files can be uploaded to CPAN by a completely different developer, without having to completely re-write all the base stuff from scratch. You can already do that with Python, but because everyone names their packages "fizzbaggy", "unclib2", "OtherThingHere", etc, there's no sane convention that clearly tells you what this thing is, what it does, or what it depends on.
The end result of all that is a nightmare in terms of regular users figuring out how to install and use most Python scripts, packages, and tools. This is part of why Go is so popular now: no need for an engineering degree just to run a Python program!
I do not believe there is any way for Python to re-invent itself and suddenly become less sprawling or confusing. The shift would be too huge and take too long.
It seems a bit weird to criticise lots of people for trying to solve the dependency problem. I know lots of people hold up npm as the standard. Npm came out around 10 years ago, Node around 12. Both of which had the benefit of hindsight at how the problem had developed for other languages. Python came out 30 years ago and pip came out 20 years later. Of course there is a lot of stuff left behind. What do people suggest, a breaking change, maybe? Python 4?
One thing that people do like about JavaScript is that you can import multiple versions of the same package. This solves that depA requires 1.0 and depB requires 2.0.
However this makes auditing nearly impossible. Small projects can end up with well over a thousand dependencies that is just unmanageable.
Linux should exert his "Linux" trademark rights by putting a representative from each of the major distros in a room, telling them to come to an agreement on a single packaging and filesystem layout for Linux, or they can no longer call their precious snowflake Linux. To me, this more than anything has hampered Linux adoption: there are too many versions of it.
If i understand correctly, this is the kind of problem Flatpak and Snap can solve. As a developer, relying on a distro's provided packages just seems too cumbersome - there are 10s of distros all with different versions of your dependency and different packaging systems. I guess the only way to distribute a Python application today that works across distro would be to basically build a single application that works simultaneously on many different python versions, not to mention the shared C libraries and other system software used by important Python packages. What do you do if you use psycopg2 X that needs libpq-dev version between 1 and 2 but the distro bundles in an incompatible version of libpq-dev Y? You have to now support 2 versions of libpq-dev across the project, and all dependents of libpq-dev such as psycopg2?
The shared dependency model is just too complex to work with, and we have enough disk space that its not really necessary anymore. Sandboxes seem like the way to go.
I got introduced to pip and virtualenv when I started coding in python in 2013-14.
Except for the system/3rd party python and python2/python3 confusion a couple of times, I've been able to set up many projects at multiple companies that quite a few people have worked on over the years without any problem just using pip and virtualenv.
Somehow every time I searched about a better way of doing it, the new tools looked more confusing that the existing one. And since the existing ones keep working very well, I haven't had any incentive to move to any other tool that does even more magic under the wraps, in the name of making things simpler, because when the required conditions aren't met, these tools simply tend to give up.
So I just setup the virtualenv with the appropriate python version and then use pip to install python dependancies. Projects have varied from webservers to computer vision and forecasting. And have not faced any issue till date.
> I manage my Python packages in the only way which I think is sane: installing them from my Linux distribution’s package manager.
Actually, this is not sane.
If you have any Python dependencies, you should always develop and deploy your code using virtualenv, never by installing packages into the system Python!
1) If your distro requires Python, don't put it on the path. Refer to it another way if you need it, e.g. have a distroname-system-python package you upgrade at your own cadence.
2) That's it. Then developers can install Python how they like, and it's (probably) all fine.
The more I see package systems that don't work the more I think Java from all the criticism got it right. Between always enforcing backward compatibility to being explicit about where to look for dependencies everything just works.
Of course you still have problems on how to declare those dependencies and resolve upgrades but a code you compiled 10 years ago will probably still work today. Compare that to Python or even worse nodejs and it's bonkers the amount of context you have to be aware of just to make your code that worked fine 2 years ago build again on the current tool chain.
>I
What is it about Linux distros that makes our use-case unimportant? Have we offered no value to Python over the past 30 years? Do you just feel that it’s time to shrug off the “legacy” systems we represent and embrace the brave new world of serverless cloud-scale regulation-arbitrage move-fast-and-break-things culture of the techbro startup?*
The thing is, people in development wont use the distro python (but different installs, pyenv, etc), and people in devops wont use the distro python either...
So, what values does it offer to whom? Except maybe scripting for sysadmins?
> Every one of these package managers is designed for a reckless world in which programmers chuck packages wholesale into ~/.pip, set up virtualenvs and pin their dependencies to 10 versions and 6 vulnerabilities ago, and ship their computers directly into production in Docker containers which aim to do the minimum amount necessary to make their user’s private data as insecure as possible.
That's pretty sobering look at the state of affairs.
The success of Python and Go shows that people don't care as much about what distributions maintainers feel is "quality packages".
The scale and quantity of software being developed is just too much for traditional Linux distributions approaches. I personally rely on Debian to know that most of the packages I use are reasonably secure. But often I need to sacrifice flexibility and bleeding-edgedness.
And it couldn't be otherwise. There are 339,267 projects on PyPi.org. I bet a good percentage of these are riddled with security flaws. But people use them anyway. How many distro maintainers would you need to handle this workload? Is it worth it? Does anyone care?
It looks like people care much more about experimentation and speed of development than stability, security or coherence and cleanliness of the solution. The author seems to feel this is wrong from an engineering (or even moral?) perspective.
I am also uncomfortable when I have to deal with Python and I have my own favourite few tools... but perhaps if Python users really wanted a single solution then it would already exist.
If you are instead convinced that there is this need and no adequate solution, then congratulations, you just found a gap in the market. Go on and do better than everyone else before you. Relevant XKCD comic is already in the article...
As an end user, I definitely care that my software works, doesn't crash, and doesn't leave my machine wide open for attacks. I don't want things to be this chaotic
I once attempted to install a program that was distributed via pypi (pip was the only way I could install software on that server). The shebang in the script file was #!/bin/python. Thats python2 on most Linux distros and so it couldn't run
That's weird, the console_scripts hook for python packages generates a script with the same python in the shebang that pip is executed with to install the said package.
Nix solves a lot of Python packaging problems and I'm a happy user. But you lose some of the convenience of plain pip. For example, with Nix, it's no longer easy to mix and match versions of different packages. You just get the package versions that happen to live in the snapshot of nixpkgs you're using. If I want to use an older version of a package than the one in nixpkgs, I now have to add an override for that package and pray that changing the version/hash is enough.
If you avoid placing packages in the distribution's system Python (by using a venv) and if you aren't using some outdated distro version, your Python should be good enough. You must write backwards compatible code anyway, if your application is to last more than six months.
It is not the end of the world to be using one or even two minor versions older than the current Python version. New features must be implemented in your code and relying on bleeding edge features of your language (in production code) is outright bad design.
If the core Python developers like to experiment and break their language, you are best to avoid jumping head first into that mess.
TL;DR:
Keep your requirements conservative. You are not bewilden to the Python core developers, your target are your users.
The biggest insight in software dependency management is that applications and libraries are different. Both have dependencies but they sit at different places in their dependency graphs.
A library can be used together with other libraries and so cannot pin its own dependency versions (if all libraries did so there would conflicts everywhere) but instead can only specify constraints on its dependency versions (eg >=1.0.0).
An application sits at the front of the dependency graph. Nothing depends on it. It can therefore lord it over its dependencies, pinning everything in the graph to specific versions. This only works, though, if it doesn't have to share an environment with other applications and unrelated libraries.
Systems like poetry (akin to npm or cargo) allow a "lock file" to be generated with pinned versions for all dependencies, satisfying all the version constraints. Applications must commit this to revision control, libraries can if they want (should IMO). This is great as it allows CI and other devs to use consistent versions.
The missing piece is that for an application, the lock file should also be used for deployment of the app. If you can ensure that an application is installed in an isolated environment with the dependency versions from the lock file (i.e. that the app was tested against) then a lot of the pain disappears.
So, the suggestion:
* Add support to standard python distribution (wheels, pypi, etc) for application packages to specify (in addition to ordinary version constraints) a pinned set of "preferred" dependency versions.
* Have tools like poetry/pipenv set these to the lock file versions.
* Allow a notion of "application packages", which are required to have this information in.
This should work very nicely with tools like pipx that deal specifically with python applications. The relevance to linux distros is that linux distros also should only be packaging applications (and their dependencies). Developers using libraries should be managing them with tools like poetry/pipenv, not the system package manager.
If the system package manager could install python applications in their own isolated environments along with their pinned dependency versions, most of the pain goes away for distro maintainers. If an application isn't working with the dependency versions it has specifically asked for, it's a clearcut problem with the application as published, and needs to be fixed upstream.
I realise this would be a significant change for package managers, but I think the same model makes sense for other languages with similar tooling and at least some of the work should only need to be done once.
Some Nix tools do this for Python, giving you separate tools for building libraries and applications. Overall, you're describing how Nix/Nixpkgs works today.
> A library can be used together with other libraries and so cannot pin its own dependency versions (if all libraries did so there would conflicts everywhere) but instead can only specify constraints on its dependency versions (eg >=1.0.0).
This is a Python defect. IIRC, for example, Node libraries can and do pin their dependencies without conflict, because Node has no problem using multiple versions of the same library in a process.
Python can't do this. It could (there have been proofs of concept that make this work), but it doesn't.
>> An application sits at the front of the dependency graph. Nothing depends on it.
In many cases, applications depend on other applications.
This can easily been seen in most Linux package management systems and also for many applications that act as more user-friendly front-ends or automation for command-line utilities.
This also applies to many others programming languages which have their own packaging systems.
While python packaging is indeed messy the needs of traditional linux distributions are by far the least important. Python packaging needs to better serve the needs of python developers, not those of sysadmins.
Having a "linux distro" interpose itself between you and your libraries is a fundamentally broken system not worth fixing. All decisions related to dependency choices fundamentally belongs with upstream.