> very few builds in the wild are deterministic and reproducible
There's a difference between 'deterministic and reproducible' and 'predictable'. Sure I can (probably) build curl ten times and get ten technically different binaries, but the differences aren't relevant to the functioning of the actual result.
Assuming my build environment isn't changing out from underneath me (i.e. if I'm using the same versions of the same dependencies on the same platform with the same compiler toolchain, and I'm not passing in different arguments or environment variables) then the end result should be functionally identical.
> You have to build up the universe from scratch (like guix and nix) for the caching to be sound enough to be reliable.
This is a false assertion. If I'm building (again, as an example) curl, then I don't need to be rebuilding the kernel, glibc, kerberos, libssh, gnutls, brotli, etc. all in order to get a working curl build; I just need to make sure the libraries and toolchains I have installed are valid. If I create a docker image with all of the dependencies and toolchains preinstalled, and then I build the exact same curl code inside of that docker container on two separate occasions, then the end result is going to be indistinguishable without hashing the build artifacts.
> Just because an algorithm exists that can help doesn't mean that the universe of software can fit cleanly into a model where that algorithm is perfect.
It doesn't need to be perfect, it just needs to be correct for the vast majority of cases (which are not somehow inherently broken already and just haven't failed yet).
> a degenerate case: glibc needs python which needs glibc
Because you're not rebuilding the entire world (in 99% of cases), this doesn't actually matter. If you're building glibc use the existing build of python that you already have installed, which is the same identical version to the one you used last time you built glibc.
If I'm rebuilding glibc, do I also need to rebuild python? If the glibc code hasn't changed, but maybe python has so I get a different result? Well okay, has the python source changed? No? Okay, have the dependencies for python or glibc changed? No? Okay, well problem solved then.
I'm not sure why people have this philosophy of "this problem isn't solved because of some imaginary, unrealistic corner case that either shouldn't exist in the first place, isn't actually a real problem, or that no one will run into". In 99% of cases it works, and if it's not right for a particular circumstance then just don't use it in that circumstance. Sometimes makefiles aren't the right tool. Sometimes a GUI isn't the right tool. Let's not argue that we shouldn't have GUIs because sometimes running a command on the CLI is easier, and let's not argue that an extremely basic caching system that makes sane assumptions that are almost always valid isn't a good idea.
And the parent is right: if you correctly define your inputs and outputs, then the system works. What those inputs and outputs are is up to you to determine. Maybe for you, the outputs are a working build of glibc, and maybe the inputs are all of the tarballs necessary to build Linux from scratch and a working bootstrap environment, but if all of those inputs and that whole environment are identical to last time, what's the point of rebuilding any of them? Maybe there is one, but that's up to you to determine and model.
I agree with you, but my point is ultimately that content hashing inputs like GP is arguing "solves" the problem doesn't actually solve the problem.
Take your curl example. Its inputs are its dependencies. Now say you depend on libcurl and a more recent version of glibc than curl was compiled against. If you only track the hashes of inputs to determine if something needs to be rebuilt, then in this scenario you will have to rebuild libcurl.
What you're talking about (determining if your dependencies are "valid" and don't need to be rebuilt) involves tracking compatibility metadata out-of-band of the actual content of the build artifacts, which is not a solved problem.
The point about reproducible builds is that they're required for any kind of build system that depends on content hashing to determine if something needs to be rebuilt, because it's actually impossible to track that based on the content alone.
It gets hairier when you start abstracting what a build artifact that depends on another build artifact is, because there's more to builds than object files, dlls, and executables.
There's a difference between 'deterministic and reproducible' and 'predictable'. Sure I can (probably) build curl ten times and get ten technically different binaries, but the differences aren't relevant to the functioning of the actual result.
Assuming my build environment isn't changing out from underneath me (i.e. if I'm using the same versions of the same dependencies on the same platform with the same compiler toolchain, and I'm not passing in different arguments or environment variables) then the end result should be functionally identical.
> You have to build up the universe from scratch (like guix and nix) for the caching to be sound enough to be reliable.
This is a false assertion. If I'm building (again, as an example) curl, then I don't need to be rebuilding the kernel, glibc, kerberos, libssh, gnutls, brotli, etc. all in order to get a working curl build; I just need to make sure the libraries and toolchains I have installed are valid. If I create a docker image with all of the dependencies and toolchains preinstalled, and then I build the exact same curl code inside of that docker container on two separate occasions, then the end result is going to be indistinguishable without hashing the build artifacts.
> Just because an algorithm exists that can help doesn't mean that the universe of software can fit cleanly into a model where that algorithm is perfect.
It doesn't need to be perfect, it just needs to be correct for the vast majority of cases (which are not somehow inherently broken already and just haven't failed yet).
> a degenerate case: glibc needs python which needs glibc
Because you're not rebuilding the entire world (in 99% of cases), this doesn't actually matter. If you're building glibc use the existing build of python that you already have installed, which is the same identical version to the one you used last time you built glibc.
If I'm rebuilding glibc, do I also need to rebuild python? If the glibc code hasn't changed, but maybe python has so I get a different result? Well okay, has the python source changed? No? Okay, have the dependencies for python or glibc changed? No? Okay, well problem solved then.
I'm not sure why people have this philosophy of "this problem isn't solved because of some imaginary, unrealistic corner case that either shouldn't exist in the first place, isn't actually a real problem, or that no one will run into". In 99% of cases it works, and if it's not right for a particular circumstance then just don't use it in that circumstance. Sometimes makefiles aren't the right tool. Sometimes a GUI isn't the right tool. Let's not argue that we shouldn't have GUIs because sometimes running a command on the CLI is easier, and let's not argue that an extremely basic caching system that makes sane assumptions that are almost always valid isn't a good idea.
And the parent is right: if you correctly define your inputs and outputs, then the system works. What those inputs and outputs are is up to you to determine. Maybe for you, the outputs are a working build of glibc, and maybe the inputs are all of the tarballs necessary to build Linux from scratch and a working bootstrap environment, but if all of those inputs and that whole environment are identical to last time, what's the point of rebuilding any of them? Maybe there is one, but that's up to you to determine and model.