Who remembers Ken Thompson's "Reflections on Trusting Trust"?
The norm today is auto-updating, pre-built software.
This places a ton of trust in the publisher. Even for open-source, well-vetted software, we all collectively cross our fingers and hope that whoever is building these binaries and running the servers that disseminate them, is honest and good at security.
So far this has mostly worked out due to altruism (for open source maintainers) and self interest (companies do not want to attack their own users). But the failure modes are very serious.
I predict that everyone's imagination on this topic will expand once there's a big enough incident in the news. Say some package manager gets compromised, nobody finds out, and 6mo later every computer on earth running `postgres:latest` from docker hub gets ransomwared.
There are only two ways around this:
- Build from source. This will always be a deeply niche thing to do. It's slow, inconvenient, and inaccessible except to nerds.
- Reproducible builds.
Reproducible builds are way more important than is currently widely appreciated.
I'm grateful to the nixos team for being beating a trail thru the jungle here. Retrofitting reproducibility onto a big software project that grew without it, is hard work.
> I'm grateful to the nixos team for being beating a trail thru the jungle here. Retrofitting reproducibility onto a big software project that grew without it, is hard work.
Actually, it's Debian guys who pushed reproducible build hard in the early days. They upstreamed necessary changes and also spread the concept itself. This is a two-decade long community effort.
In turn, NixOS is mostly just wrapping those projects with their own tooling, literally a cherry on the top. NixOS is disproportionately credited here.
I think both efforts have been important and have benefitted each other. Nix has always had purity/reproducibility as tenets, but indeed it was Debian that got serious about it on a bit-for-bit basis, with changes to the compilers, tools like diffoscope, etc. The broader awareness and feasibility of reproducible builds then made it possible for Nix to finally realise the original design goal of a content-addressed rather than input-addressed store, where you don't need to actually sign your binary cache, but rather just sign a mapping between input hashes and content hashes.
Of course, yes— that was what I was saying. But the theory with content-addressability is that unlike a conventional distro where the binaries must all be built and then archived and distributed centrally, Nix could do things like age-out the cache and only archive the hashes, and a third party could later offer a rebuild-on-demand service where the binaries that come out of it are known to be identical to those which were originally signed. A similar guarantee is super useful when it comes to things like debug symbols.
They have new challenges with new packages. In the last 5 years there entered a lot of rust packages for example, a new compiler to tackle reproducibility with (and not trivial, even if upstream has worked on it a lot).
No, rust leaks the path to the source code on the build machine. This path likely does not even exist on the execution machine, so there's absolutely no good reason for this leakage. It is very nonstandard.
It is really, really annoying that the Rust team is not taking this problem seriously.
I don't think this is correct. Most compilers include the path to the source code on the build machine in the debug info, and it's a common problem for reproducible builds. This is not a rust-specific issue.
Obviously the binary can't contain paths from the execution machine because it doesn't know what the execution machine will be at compile time, and the source code isn't stored on the execution machine anyway. The point of including the source path in the debug info is for the developer to locate the code responsible if there's a crash.
But is it only on debug builds? Or are release builds affected? Because if it’s the latter, that’s a big issue. But for the former, does it really matter?
At least in openSUSE, we always build with gcc -g and then later strip debug symbols into separate debuginfo files. This leaves a unique hash in the original file and that makes them vary if the build path changes.
> Build farms are also important for release management - the production of software releases - which must be an automatic process to ensure reproducibility of releases, which is in turn important for software maintenance and support.
Eelco's thesis (from 2006) also has this as the first bullet-point in its conclusion:
> The purely functional deployment model implemented in Nix and the cryptographic hashing scheme of the Nix store in particular give us important features that are lacking in most deployment systems, such as complete dependencies, complete deployment, side-by-side deployment, atomic upgrades and rollbacks, transparent source/binary deployment and reproducibility (see Section 1.5).
That's somewhat uncharitable. patchelf, for example, is one tool developed by NixOS which is widely used for reproducible build efforts. (although I don't know concretely if Debian uses it today)
patchelf is not really widely used for solving reproducible builds issues. It's made for rewriting RPATHs which is essential for NixOS, but not something you would be seeing in other distributions except for when someone need to work around poor upstream decisions.
tl;dr: Debian's work is very important here, but NixOS' reproducibility aims are more general than Debian's and began more than 8 years earlier
Despite the fact that Debian (as a project) has shouldered far more of the work with upstream projects to make bit-identical reproducibility possible at build time, Debian (as a distro) doesn't have a design that makes this kind of reproducibility as feasible, practical, or robust at the level of a whole system or disk image in the way that NixOS has achieved here. To quote the Debian project itself[0]:
> Reproducible builds of Debian as a whole is still not a reality, though individual reproducible builds of packages are possible and being done. So while we are making very good progress, it is a stretch to say that Debian is reproducible.
Beyond the fact that some packages still have issues upstream and the basic technical problem of versioning (i.e., apt fetching binaries from online archives in a stateful way) Debian additionally struggles with an extremely heterogeneous and manual process of acquiring and uploading source packages[1]. Debian doesn't even have the resources to construct a disk image where the version of every package is pinned, short of archiving all the binaries (which is how they do ‘reproducible’ ISO production now[2]). But pulling down all of the pre-built binaries for your distro isn't really ‘reproduction’ in the same sense as ‘reproduction’ in Debian's (package-level) reproducibility project.
Some points of comparison
• NixOS always fixes the whole dependency tree
• Debian requires a ‘snapshot’ repository to fix a dependency tree
• most NixOS packages are updated through automatic tools and all the build recipes are stored under version control in one place
• Debian packages can be updated any way that suits their maintainers, and the build recipes/rules can be stored anywhere (it's the maintainer's job to keep them in version control if they want, then upload them to Debian repositories as source packages)
• Nix (transparently!) caches both build outputs and package sources, which means
◦ if the original source tarballs (e.g., on GitHub or SourceForge) are unavailable, Nix won't even notice that if it can pull them from the ‘binary’ cache
◦ if there is no cache of the build outputs, Nix will automatically fall back to fetching and unpacking the sources from the upstream mirror
• Debian's technical and community relationships to upstream source code are both less robust
◦ Debian requires manual management (creating and uploading) of complete source code archives in their own format[1]
◦ sometimes Debian infrastructure can't even reproduce upstream source code from their own archives[3]
◦ if Debian's source archives are unavailable for a package, there is just no way to build it (since source package archives also contain the build instructions, dependency metadata, etc.)
Actually reproducing a NixOS image is less manual and can be done without relying on any online Nix/NixOS-specific infrastructure, and this is a real advancement over what's possible with binary distros like Debian. (Some other binary distros, like openSUSE) also have centralized version control for package definitions.)
One way to conceptualize the qualitative differences in reproducibility outlined above is by examining the ways that Nix strengthens Debian's definition of reproducibility[4], which reads:
> A build is reproducible if given the same source code, build environment and build instructions,
For Nix, the build instructions can simply encode all of what Debian calls the ‘relevant attributes of the build environment’:
> Relevant attributes of the build environment would usually include dependencies and their versions, build configuration flags and environment variables as far as they are used by the build system (eg. the locale). It is preferable to reduce this set of attributes.
And similarly, for NixOS, the acquisition of source code is folded into the build instructions and the ‘build environment’ (i.e., caches being available or GitHub not being down). So every Nix package that is reproducible at all is reproducible in a more general way than a reproducible Debian package.
And NixOS/Nix have had to do real work to make their systems reproducible in ways that Debian is not. Unlike much of Debian's work its benefits can't really be shared with distros of a different design— but the converse is sometimes true as well. For example, Debian's work on rooting out non-determinism in package post-install hooks[5] is useless (and unnecessary) for NixOS, Guix, and Distri, since their packages don't have post-install hooks.
There are also lots of little ways that issues Debian has worked on either reflect the relative weakness of this notion of reproducibility (e.g., ‘All relevant information about the build environment should either be defined as part of the development process or recorded during the build process.’[6] is a way of saying ‘the build environment should be reproducible or merely documented’) or overcoming challenges that systems designed with reproducibility in mind from the start simply don't face.
At the same time, the Reproducible Builds website refers to publications[7] by former Nix developers who directly cite the original Nix paper from 2004, whereas Debian's effort didn't begin in earnest until 2013.[8]
Compared to the Nix community, Debian is huge. And they've leveraged their collective expertise and considerable volunteer force to do a ton of work toward reproducible builds which has benefited reproducibility for everyone, including NixOS. Doubtless every remotely attentive member of the Nix community is grateful for that work, which a small community like Nix's could hardly have taken up on its own. But Nix has been attacking reproducibility issues at a different level (reproducing build environments, source code, and whole systems (in terms of behavior, if not bits)) in a meaningful way since long before Debian's reproducible builds effort got going. And some of those efforts have informed the wider reproducible builds effort, just like some of Debian's efforts have not been applicable to every project in the F/OSS community which is interested in reproducible builds.
So: let's praise Debian loudly and often for their work here and be clear that NixOS' reproducibility couldn't be where it is today without that work... but let's also be clear that Nix/NixOS absolutely has blazed some trails in the territory of reproducibility— a terrain that both communities are still mapping out together. :)
Reproducibility is necessary, but unfortunately not sufficient, to stop a "Trusting Trust" attack. Nixpkgs still relies on a bootstrap tarball containing e.g. gcc and binutils, so theoretically such an attack could trace its lineage back to the original bootstrap tarball, if it was built with a compromised toolchain.
Indeed, and with the work done by Guix and the Reproducible Builds project we do have a real-world example of diverse double compilation which is not just a toy example utilizing the GNU Mes C compiler.
Projects like GNU Mes are part of the Bootstrappable Builds effort[0]. Another great achievement in that area is the live-bootstrap project, which has automated a build pipeline that goes from a minimal binary seed up to tinycc then gcc 4 and beyond.[1]
I feel the need to point out that the "Bootstrappable Builds" project is a working group from a Reproducible Builds project which where interested in the next step beyond reproducing binaries. Obviously this project has seen most effort from Guix :)
The GNU Mes C experiment mentioned above was also conducted during the 2019 Reproducible Builds summit in Marrakesh.
In principle, diverse double-compiling merely increases the number of compilers the adversary needs to subvert. There are obvious practical concerns, of course, but frankly this raises the bar less than maintaining the backdoor across future versions of the same compiler did in the first place, since at least backdooring multiple contemporary compilers doesn't rely on guessing, well ahead of time, what change future people are going to make.
Critically, it shouldn't be taken as a demonstration that the toolchain is trustworthy unless you trust whoever's picking the compilers! This kind of ruins approaches based on having any particular outside organization certify certain compilers as "trusted".
There is an uphill effort here to actually do this. While theoretically a very informed adversary might get it right first time, human adversaries are unlikely to and their resources are large, but far from infinite.
Your entire effort is potentially brought down by someone making a change in a way you didn't expect and someone goes "huh, that's funny..."
Quite frankly, I'm surprised that is hasn't come up multiple times in the course of getting to NixOS and etc. The attacks are easy to hide and hard to attribute.
Programs built by different compilers aren't generally binary comparable, e.g. we shouldn't expect empty output from `diff <(gcc run-of-the-mill.c) <(clang run-of-the-mill.c)`
However, the behaviour of programs built by different compilers should be the same. Run-of-the-mill programs could use this as part of a test suite, for example; but diverse double compilation goes a step further:
We build compiler A using several different compilers X, Y, Z; then use those binaries A-built-with-X, A-built-with-Y, A-built-with-Z to compile A. The binaries A-built-with-(A-built-with-X), A-built-with-(A-built-with-Y), A-built-with-(A-built-with-Z) should all be identical. Hence for 'fully countering trusting trust through diverse double-compiling', we must compile compilers https://dwheeler.com/trusting-trust/
Actually, being able to build projects much easier from GitHub is the sole reason why I'm currently using Arch as my main OS.
Building a project is just a shell script with a couple of defined functions. Quite literally.
I really admire NixOS's philosophy of pushing the boundaries as a distro where everything, including configurations and modifications, can be done in a reproducible manner. They're basically trying to automate the review process down the line, which is absurdly complex as a challenge.
And given stability and desktop integrations improve over time, I really think that Nix has the potential to be the base for easily forkable distributions. Building a live/bootable distro will be so much easier, as everything is just a set of configuration files anyways.
This is slightly different thing. Nix and NixOS are trying to solve multiple things, and that's what it might be a bit confusing.
Many people don't realize that, but if you get for example mentioned project from github and I do and we compile it on our machines we get a different file (it'll work the same but it won't be exactly the same).
Say we use the same dependencies, we still will get a different files, because maybe you used slightly different version of the compiler, or maybe those dependencies were compiled with different dependencies or compilers. Maybe the project while building inserts a date, or pulls some file. There are million ways that we would end up with different files.
The goal here is to get bit by bit identical files and it's like a Holy Grail in this area. NixOS just appears to achieved that and all packages that come with the system are now fully reproducible.
A rich source of non-reproducibility is non-determinism introduced by parallel building.
Preserving parallel execution, but arriving at deterministic outputs, is an interesting and ongoing challenge. With a rich mathematical structure, too.
> and 6mo later every computer ... gets ransomwared.
I'm really surprised such an attack hasn't happened already. It seems so trivial for a determined attacker to take over an opensource project (plenty of very popular projects have just a single jaded maintainer).
The malicious compiler could inject an extra timed event into the main loop for the time the attack is scheduled to begin, but only if it's >3 hours away, which simply retrieves a URL and executes whatever is received.
Detecting this by chance is highly unlikely - because to find it, someone would have to have their clock set months ahead, be running the software for many days, and be monitoring the network.
That code is probably only a few hundred bytes, so it probably won't be noticed in any disassembly, and is only executed once, so probably won't show up in debugging sessions or cpu profiling.
It just baffles me that this hasn't been done already!
> I'm really surprised such an attack hasn't happened already.
If you count npm packages this happened quite a few times already. People (who don't understand security very well) seems to be migrating to python now.
Unless you are going to be the equivalent of a full time maintainer doing code review for every piece of software you use you need to trust other software maintainers reproducible builds or not. Considering this is Linux and not even Linus can deeply review every change in just the kernel anymore that philosophy can't apply to meaningfully large software like Nixos.
That's too black-and-white. Being able to reproduce stuff makes some kind of attacks entirely uninteresting because malicious changes can be traced back. Which is what many types of attackers do not want. Debian, or the Linux kernel, for example, are not fool-proof, but both are in practice quite safe to work with.
Who are you going to trace it back to if not the maintainer anyways? If the delivery method then why is the delivery of the source from the maintainer inherently any safer?
No, it is not always the maintainer. Imagine you download a binary software package via HTTPS. In theory, the integrity of the download is protected by the server certificate. However, it is possible that certificates get hacked, get stolen, or that nation states force CAs to give out back doors. In that case, your download could have been changed on the fly with arbitrary alterations. Reproducible builds make it possible to detect such changes.
Same as when you download the source instead of the binary and see it reproducibly builds the backdoored binary. And at this point we're back to "Build from source. This will always be a deeply niche thing to do. It's slow, inconvenient, and inaccessible except to nerds." anyways.
It's not that reproducible builds provide 0 value it's that they don't truly solve the trust problem as initially stated. They also have non-security value to boot which is often understated compared to the security value IMO.
I guess reproducible builds solve some of the problems in the same way TLS/SSL solves some of the problems.
Most of the world is happy enough with the soft guarantee of: “This is _probably_ your bank’s real website. Unless a nation state is misusing their control over state owned certificate authorities, or GlobalSign or LetsEncrypt or whoever has been p0wned.”
Expecting binary black and white solutions to trust problems isn’t all that useful, in my opinion. Often providing 50% more “value” in trust compared to the status quo is extremely valuable in the bigger picture.
Reproducible builds solve many security problems for sure but but the problems it solves in no way help you if the maintainer is not alturistic or bad at security as originally stated. It helps tell you if the maintainers toolchain wasn't compromised and it does it AFTER the payload is delivered and you built your own payload not made by the maintainer anyways. It doesn't even tell you the transport/hosting wasn't compromised unless you can somehow get a copy of the source used to compile not made by the maintainer directly as the transport/hosting for the source they maintain could be as well.
Solving that singular attack vector in the delivery chain does nothing for solving the need to trust the altruism and self interest of maintainers. A good thing™? Absolutely, along with the other non security benefits, but has nothing to do with needing to trust maintainers or be in the niche that reviews source code when automatic updates come along as originally sold.
> but the problems it solves in no way help you if the maintainer is not alturistic or bad at security as originally stated.
That same edgewise applies to your bank too. Pinned TLS certs or pre shared keys might help against "BadGuys(tm)", but you're still screwed if your bank decides to keep your money. (s/bank/online crypto wallet/ for real world examples there...)
The question isn't whether they're perfect, nor is it whether they prevent anything. But it does help a person who suspects something is up rule certain things in and out, which increases the chances that the weak link can be found and eliminated.
If you have a fair suspicion that something is up and you discover that when you compile reproduceable-package you get a different output than when you download a prebuilt reproduceable-package, you've now got something to work with.
Your observation that they don't truly solve the trust problem is true. But it's somehow not relevant. It is better to be better off.
Reproducible builds still help a lot with security. For example, they let you shift build latency around.
Eg suppose you have a software package X, available both as a binary and in source.
With reproducible builds, you can start distributing the binary to your fleet of computers, while at the same time you are kicking off the build process yourself.
If the result of your own build is the same as the binary you got, you can give the command to start using it. (Otherwise, you quarantine the downloaded binary, and ring some alarm bells.)
Similarly, you can re-build some random sample of packages locally, just to double-check, and report.
If most debian users were to do something like that, any tempering with the debian repositories would be quickly detected.
(Having a few malicious users wouldn't hurt this strategy much, they can only insert noise in the system, but not give you false answers that you trust.)
Even if the original attack happened upstream, if the upstreamed piece of software was pinned via git, then it'd be trivial to bisect the upstream project to find the culprit.
This is great if you are looking at attributing blame. Not so great if you are trying to prevent all the worlds computers getting owned....
I'd imagine that if I were looking at causing world wide chaos, I'd love nothing better than getting into the tool chain in a way that I could later on utilise on a wide spread basis.
At that point I would have achieved my aims and if that means I've burnt a few people along the way, so be it, I'm a bad guy, the damage has been done, the objective met.
You can't solve this problem without having a full history of code to inspect (unless you are decompiling), reproducibility is the first step and bootstrapability is the second step. Then we refine the toolchains and review processes to ensure high impact code is properly scrutinized.
What we can't do is throw our hands up and say anyone who compromises the toolchain deep enough is just allowed to win. It will happen at some point if we don't put the right barriers in place.
It's the first step of a long journey, but it is a step we should be taking.
https://github.com/fosslinux/live-bootstrap is another approach, bootstrapping from a tiny binary seed that you could hand-assemble and type in as hex. But it doesn't address the dependency on the underling OS being trustworthy.
There is stage0 by Jeremiah Orians that is designed to be able to bootstrap on hardware that can be built from transistors. Currently it mostly runs in a small VM process that is somewhat harder to subvert.
Reproducibility is what allows you to rely on other maintainers' reviews. Without reproducibility, you can't be certain that what you're running has been audited at all.
It's true that no single person can audit their entire dependency tree. But many eyes make all bugs shallow.
No. I can review 0.1% of the code and verify that it compiles correctly and then let another 999 people review their own portion. It only takes one person to find a bit of malicious code, we don’t all need to review every single line.
You are misunderstanding what I am saying. I am saying that it only takes one person who finds a vulnerability to disclose it, to a first approximation. Realistically it’s probably closer to 2-3 since the first might be working for the NSA, the CCP, etc. I am making no arguments about what amount of effort it takes to find a vulnerability, just talking about how not every single user of a piece of code needs to verify it.
That only works if you coordinate. With even more people, you can pick randomly and be relatively sure you've read it all, but I posit that 1) you don't pick randomly, you pick a part that is accessible or interesting to you (and therefore probably others) and 2) reading code locally is not sufficient to find bugs or backdoors in the whole.
I actually wonder if it’s possible to write code at such a macro level as to obfuscate, say, a keylogger in a huge codebase such that reviewing just a single module/unit would not reveal that something bad is going on.
Depends on how complicated the project itself is. A simple structure with the bare minimum of side-effects (think, functional programming) would make this effort harder.
Supply chain attacks are definitely important to deal with, but defense-in-depth saves us in the end. Even if a postgres container is backdoored, if the admins put postgres by itself in a network with no ingress or egress except the webserver querying it, an attack on the database itself would be very difficult. If on the other hand, the database is run on untrusted networks, and sensitive data kept on it... yeah, they're boned.
In the case of a supply chain attack, you don't even need ingress or egress.
Say the posgres binary or image is set to encrypt the data on a certain date. Then it asks you to pay X ZEC to a shielded address to get your decryption key. This would work even if the actual database was airgapped.
Building from source doesn't have to be inaccessible, if the build tooling around it is strong. Modern compiled languages like Go (or modern toolchains on legacy languages like vcpkg) have a convention of building everything possible from source.
So at least for software libraries building from source is definitely viable. Fro end user applications it's another story though, doubt we will ever be at a point where building your own browser from source makes sense...
Building from source also doesn’t buy you very much, if you haven’t inspected/audited the source.
The upthread hypothetical of a compromised package manager equally applies to a compromised source repo.
_Maybe _ you always check the hashes? _Maybe_ you always get the hashes from a different place to the code? _Maybe_ the hypothetical attacker couldn’t replace both the code you download and the hash you use to check it?
(And as Ken pointed out decades ago, maybe the attacker didn’t fuck with your compiler so you had lost before you even started.)
>The norm today is auto-updating, pre-built software.
Only if you define "norm" as what's prevalent in consumer electronics and phones. Certainly, if you go by numbers, it's more common than anything else.
That's not due to choice, though, it's because of the desires of corporations for ever more extensive control of their revenue streams.
If the package maintainer's build pipeline is compromised (eg. Solarwinds), you are unlikely to be affected if you build from reviewed source yourself.
I genuinely believe to spend resources on issues where ROI is positive.
So far exploits on FOSS kind of prove the point not everyone is using Gentoo, reading every line of code on their emerged packakges, let alone similar computing models.
Now if we are speaking about driving the whole industry to where security bugs, caused by using languages like C that cannot save us from code reviews unless done by ISO C language lawyers and compiler experts in UB optimizations, are heavily punished like construction companies are for a fallen bridge, then that would be interesting.
> I genuinely believe to spend resources on issues where ROI is positive.
How are you measuring the ROI of security efforts inside an OSS distro like debian or nixos? The effort in such orgs is freely given, so nobody knows how much it costs. And how would you calculate the return on attacks that have been prevented? Even if an attack wasn't prevented you don't know how much it cost, and you might not even know if it happened (or if it happened due to a lapse in debian.)
>So far exploits on FOSS kind of prove the point not everyone is using Gentoo, reading every line of code on their emerged packakges, let alone similar computing models.
Reproducible builds is attempting to mitigate a very specific type of attack, not all attacks in general. That is, it focuses on a specific threat model and countering that, nothing else. It's not a cure for cancer either.
>Now if we are speaking about driving the whole industry to where security bugs, caused by using languages like C that cannot save us from code reviews unless done by ISO C language lawyers and compiler experts in UB optimizations, are heavily punished like construction companies are for a fallen bridge, then that would be interesting.
This is just a word salad of red herrings. Different people can work on different stuff at the same time.
> I can't come up with a single benefit to security from reproducible builds.
It is a means to allow to detect a compromised supply chain. If people rebuilding a distro cannot get the same hash as the distro shipping from the distributor, then likely the distributors infrastructure has been compromised
How does this work in practice? The distro is owned, so where are you getting the hash from? I mean, specifically, what does the attacker have control of and how does a repeatable build help me stop them.
The idea is that multiple independent builders build the same distro. You expect all of them to have the same final hash.
This doesn't help against the sources being owned, but it helps about build machines being owned.
Accountability for source integrity is in theory provided by the source control system. Accountability for the build machine integrity can be provided by reproducible builds.
To answer your specific questions: The attacker has access to the distro's build servers and is packaging and shipping altered binaries that do not correspond to the sources but instead contain added malware.
Reproducible builds allow third parties to also build binaries from the same sources and once multiple third parties achieve consensus about the build output, it becomes apparent that the distro's build infrastructure could be compromised.
OK so a build machine is owned and we have a sort of consensus for trusted builders, and if there's a consensus mismatch we know something's up.
I suppose that's reasonable. Sounds like reproducible builds is a big step towards that, though clearly this requires quite a lot of infrastructure support beyond just that.
This is great! The one fly in the ointment, pardon, is that Nix is a bit lax about trusting proprietary and binary-only stuff. It would be great if there were a FLOSS-only core system for NixOS which would be fully transparent.
It's the pragmatic thing. I wouldn't use nixOS if I wasn't able to use it on a 16 core modern desktop. I don't think there's a performant and 100% FLOSS compatible computer that wouldn't make me want to gouge my eyes out with a rusty spoon when building stuff for ARM.
Talos has 44 core/176 thread server options which can take 2 TBs of DDR4 that are FSF certified. The board firmware is also open and has reproducible builds.
Talos have as low as 8 core desktop options as well this is just an example of how far you can take FLOSS hardware. Not that I consider a 16 core x86 desktop "consumer-grade" in the first place (speaking as a 5950X owner).
Probably not fit for replacing Grandma's budget PC but then again grandma probably isn't worried about the ARM cross compile performance of their machine running NixOS either.
Thanks, I was legitimately unaware of this option. That does smash my argument, but I'm not likely to be using a system like that anytime soon due to cost concerns mostly.
And it’s not just hardware, there is a useful limit on purity of licenses. In many cases only proprietary programs can do the work at all, or orders of magnitudes better.
I don’t have the resources to audit every component of my system. I favour enterprise distros who audit code which ends up in their repos and avoid pip, npm, etc. but there are some glaring trade offs on both productivity and scalability.
The problem is unmaintainability, I can’t imagine it’d be easier for medium sized teams where security isn’t a priority, either.
Not just time, IME. Also 1. highly resource intensive, e.g., cannot compile on small form factor computers (easier for me to compile a kernel than a "modern" browser) and 2. brittle.
Unfortunately, it's easy to break a lot of builds by things such as deciding not to install to /usr/local, or by building on a Mac. Pushing publishers to practices that aid reproducible builds would help both sides.
I'd love to try building NetBSD, btw, I must try that!
This is a big deal. Congratulations to all involved.
In Software, complexity naturally increases over time and dependencies and interactions between components become impossible to reason about. Eventually this complexity causes the Software to collapse under its own weight.
Truly reproducible builds (such as NixOS and Nixpkgs) provides us with islands of "determinism" which can be taken as true invariants. This enables us to build more Systems and Software on top of deterministic foundations that can be reproduced by others.
This reproducibility also enables powerful things like decentralized / distributed trust. Different third-parties can build the same software and compare the results. If they differ, it could indicate one of the sources has been compromised. See Trustix https://github.com/tweag/trustix
I don't see a single comment doubting the value of reproducibility, so I'll be the resident skeptic :)
I think build reproducibility is a cargo cult. The website says reproducibility can reduce the risk of developers being threatened or bribed to backdoor their software, but that is just ridiculous. Developers have a perfect method for making their own software malicious: bugdoors. A bugdoor (bug + backdoor) is a deliberately introduced "vulnerability" that the vendor can "exploit" when they want backdoor access. If the bug is ever discovered you simply issue a patch and say it was a mistake, it's perfectly deniable. It's not unusual for major vendors to patch critical vulnerabilities every month, there is zero penalty for doing this.
The existence of bugdoors means you have to trust the vendor who provided the source code, there is no way around this.
You have to trust the developer, but in theory, reproducible builds could be used to convince yourself their build server hasn't been hacked. This isn't really necessary or useful, you can already produce a trustworthy binary by just building the source code yourself. You still have to trust the vendor to keep hackers off everything else though!
Okay, but building software is tedious, and for some reason you are particularly concerned about build servers being hacked. Perhaps you will nominate a dozen different organization that will all build the code, and make this a consensus system. If they all agree, then you can be sure enough the binaries were built with a trustworthy toolchain. A modest improvement in theory, but that introduces a whole bunch of new crazy problems.
You can't just pick one or two consensus servers, because then an attacker can stop you getting updates by compromising any one of them. You will have to do something like choose a lot of servers, and only require 51% to agree.
Now, imagine a contentious update like a adopting a cryptocurrency fork, or switching to systemd (haha). If the server operators rebel, they can effectively veto a change the vendor wants to make. Perhaps vendors will implement a killswitch that allows them to have the final say, or perhaps they operate all the consensus build servers themselves.
The problem is now you've either just replaced build servers with killswitches, or just replicated the same potentially-compromised buildserver.
I wrote a blog post about this a while ago, although I should update it at some point.
Most people here are debating you on the security angle, but in the case of Nix (and Guix) there is another important angle - reproducible builds make a content-addressed store possible.
In Nix, the store is traditionally addressed by the hash of the derivation (the recipe that builds the package). For example, lr96h... in the path
is the hash of the (normalized) derivation that was used to build coreutils. Since the derivation includes build inputs, either changing the derivation for coreutils itself or one of its inputs (dependencies) results in a different hash and a rebuild of coreutils.
This also means that if somebody changes the derivation of coreutils every package that depends on coreutils will be rebuilt, even if this change does not result in a different output path (compiled package).
This is being addressed by the new work on the content-addressed Nix store (although content-addressing aws already discussed in Eelco Dolstra's PhD thesis about Nix). In the content-addressed store, the hash in the path, such as the on above is a hash of the output path (the built package), rather than a hash of the normalized derivation. This means that if the derivation of coreutils is changed in such a way that it does not change the output path, none of the packages that depend on coreutils are rebuilt.
However, this only works reliably with reproducible builds, because if there is non-determinism in the build, how do you know whether a change in the output path is changed as a result of changing a derivation or as a result of uninteresting non-determinisms (the output hash would change in both cases).
Where the dependency chain is long, this substantially reduces build work during development too.
I'd guess that more than half of the invocations of gcc done by Make for example end up producing the exact same bit for bit output as some previous invocation.
I would point out that is literally what ccache (and Google goma) does, but doesn't require deterministic builds. Instead, it records hashes of preprocessed input and compiler commandlines.
They don't make any security claims about this, it's just for speeding up builds.
What we currently do --- hashing inputs --- is the same ccache way. We just don't yet sandbox with the granularity yet.
What we want to id hash outputs. Say I replace 1 + 2 with 0 + 3. That will cause ccache to rebuild. We don't want downstream stuff to also be rebuilt. C-linking withing a package is nice in parallelizable, but in the general case there is more dependency chains and now that sort of thing starts to matter.
Another non-security angle: doesn't computer science also face a kind of replicability crisis related to the ability to acquire and compile source code associated with some published papers? Reproducible builds directly address that.
And it seems like even when that problem is resolved for the empirical component of computer science, bit-identical reproducibility could be valuable in case binaries are never submitted or distributed. This NixOS release is in a way a benchmark for how far we can currently get on a 'useful' system with that kind of reproducibility.
I don't really have any complaints about using deterministic builds for non-security reasons, but the number one claim most proponents make is that it somehow prevents backdoors. Literally the first claim on reproducible-builds.org is that build determinism will prevent threats of violence and blackmail.
Honestly I think the biggest benefit of reproducibility is just debuggability. We both check out the same git repo and build it, we can later hash the binary and compare the hashes to know we're running the exact same code.
On security, if you really care about compromised build servers you might as well just build from source yourself. I think reproducibility might matter most in systems where side loading is hard/impossible like app stores, but I'm not familiar with the current state of the art in terms of iOS reproducable builds and checking them.
Reproducability is an option to mitigate backdoors and incentive developers to operate openly. It's no panacea, but it makes a lot of sense in open-source projects where individual actors are going to represent your largest threat vector. That way, it becomes a lot harder to push an infected blob to main, even if it still is technically possible. Hashes are also "technically pointless", but we still implement them liberally to quickly account for data integrity.
> Reproducibility is technically pointless, because you still have to trust the developer, and they can still add backdoors.
Builder != developer - and with reproducible builds, you no longer beed to trust the builder. CI is commonly used for the final distributable builds and you can't always trust the CI server. Even if you do, many rely on third party thingd like docker images - if the base build image gets compromised, code could trivially be injected into builds running on it and without reproducible builds, that would not be detectable.
As a developer, it would be quite reassuring to build my binary (which I already do for testing) and compare the hash with the one from the CI server to confirm nothing has been tampered with. As a bonus, distro maintainers who have their own CI can also check against my hashes to verify their build systems aren't doing something fishy (malicious or otherwise).
> As a developer, it would be quite reassuring to build my binary (which I already do for testing) and compare the hash with the one from the CI server to confirm nothing has been tampered with.
That makes sense! However, this is not a good argument for reproducible builds, because you can already do that today.
You already have to build a trusted binary locally for testing right? You're dreaming of being able to compare that against the untrusted binary so that you can make sure it's a trusted binary too - but you already have a trusted binary!
Okay - but it's a hassle, you don't want to have to do that, right? Too bad - reproducible builds only work if someone reproduces them. You're still going to have to replicate it somewhere you trust, so you gained practically nothing.
> With the reproducible build, you can start using the untrusted binary while you are still building your trusted one.
That's not how it works, you have to reproduce it before it becomes trusted.
> You can also have ten people on the internet verify the untrusted binary.
Sure, then we have to build a complex consensus system that introduces a bunch of unsolved problems. My opinion is that this just isn't worth it, there is practically nothing to gain and it's really really hard.
> That's not how it works, you have to reproduce it before it becomes trusted.
Eh, there's stuff you can do with software before you trust it. Eg you can start pressing the CDs or distributing the data to your servers. Just don't execute it, yet.
> Sure, then we have to build a complex consensus system that introduces a bunch of unsolved problems. My opinion is that this just isn't worth it, there is practically nothing to gain and it's really really hard.
It's the same informal system that keeps eg debian or the Linux kernel secure currently:
People don't do kernel reviews themselves. They just use the official kernel, and when someone finds a bug (or spots otherwise bad code), they notify the community.
Similar with reproducible builds: most normal people will just use the builds from their distro's server, but independent people can do 'reviews' by running builds.
If ever a build doesn't reproduce, that'll be a loud failure. People will complain and investigate.
Reproducible builds in this scenario don't protect you from untrusted code upfront, but they make sure you'll know when you have been attacked.
> People don't do kernel reviews themselves. They just use the official kernel, and when someone finds a bug (or spots otherwise bad code), they notify the community.
There's a big difference here. When a vulnerability is found in the Linux kernel, that doesn't mean that you were compromised.
If a build was found to be malicious, then you definitely were compromised and it's little solace that it was discovered after the fact. This is why package managers check the deb/rpm signature before installing the software, not after.
Reproducibility means you don't have to worry that the developer might have a backdoored toolchain (which also means that they can't pretend that a malicious toolchain added the malicious code without their knowledge).
A talented developer might still be able to create a bugdoor which gets past code review, but that takes more effort and skill than just putting the malicious code into a local checkout and then saying "How did that get there?".
I think the workflow you're proposing is to take some trusted source code, then compile it to make a trusted binary. Now compare the trusted binary to the untrusted binary provided by the vendor - If they're the same - then it must have been made by an uncompromised toolchain.
That does require reproducible builds, but here is how to do it without reproducible builds:
Take the trusted source code, then compile it to make a trusted binary. Now put the untrusted binary in the trash, cause you already have a trusted binary :)
Is it technically pointless if you view it as a check on your own build, rather than a check on the work of others?
You are obviously familiar with Bazel/Blaze etc. Wouldn't reproducibility be necessary for those systems to work well most of the time? I can think of exceptions (like PGO), but it seems useful to produce at least some binaries this way. Also covered in this: https://security.googleblog.com/2021/06/introducing-slsa-end...
> Is it technically pointless if you view it as a check on your own build, rather than a check on the work of others?
That depends, I think it's difficult and mostly still pointless. I wrote about this a bit in the blog post I linked to. It's a big trade off, for questionable benefit.
> Wouldn't reproducibility be necessary for those systems to work well most of the time?
Yes, there are definitely some good non-security reasons to want deterministic builds. My gripe is only with the security arguments, like claims it can reduce threats of violence against developers (!?!).
> What isn’t clear is what benefit the reproducibility provides. The only way to verify that the untrusted binary is bit-for-bit identical to the binary that would be produced by building the source code, is to produce your own trusted binary first and then compare it. At that point you already have a trusted binary you can use, so what value did reproducible builds provide?
That's not the interesting case. The interesting case is when the untrusted binary doesn't match the binary produced by building the source code. Assuming that the untrusted binary has been signed by its build system, you now have proof that the build system is misbehaving. And that proof can be distributed and reproduced by everyone else.
Once Debian is fully reproducible, I expect several organizations (universities, other Linux distribution vendors, governments, etc) to silently rebuild every single Debian package, and compare the result with the Debian binaries; if they find any mismatch, they can announce it publicly (with proof), so that the whole world (starting with the Debian project itself) will know that there's something wrong. This does not need any complicated consensus mechanism.
> More often, attackers want signing keys so they can sign their own binaries, steal proprietary source code, inject malicious code into source code tarballs, or malicious patches into source repositories.
In Debian, compromising the build server is not enough to inject malicious code into source code tarballs or patches, since the source code is also signed by the package maintainer. Unexpected changes on which maintainer signed the source code for a given package could be flagged as suspicious.
The only attack left from that list, at least for Debian, would be for the attacker to sign their own top-level Release file (on Debian, individual packages are not signed, instead a file containing the hash of a file containing the hash of the package is what is signed). But the attacker cannot distribute the resulting compromised packages to everyone, since those who rebuild and compare every package would notice it not matching the corresponding source code, and warn everyone else.
> I expect several organizations (universities, other Linux distribution vendors, governments, etc) to silently rebuild every single Debian package, and compare the result with the Debian binaries
This has been happening for many years. A lot of large companies that care about security and maintainability sign big contracts to with tech companies that often include indemnification.
>Developers have a perfect method for making their own software malicious: bugdoors.
I think rather than malicious developers the focus is on malicious build machines. How many things are built solely via CI these days, on machines that nobody has ever seen, using docker images that nobody has validated?
It's much easier to imagine a malicious provider (as in Sourceforge bundling in adware) than malicious developers, I think.
But yes, you're right that reproducible builds don't remove the need to trust the source.
>You have to trust the developer, but in theory, reproducible builds could be used to convince yourself their build server hasn't been hacked. This isn't really necessary or useful, you can already produce a trustworthy binary by just building the source code yourself.
This is pretty much all false though - not only the "just" part, as setting up a proper build environment is pretty non-trivial for many projects, and building everything from source is a task only the most dedicated Gentoomen would take up; you can also think of reproducible builds as a "litmus test". If you can, with reasonable accuracy, check whether a build machine is compromised at any time, you have a much greater base on which to trust it and its outputs. The benefits of having build machines probably shouldn't need explaining.
>You can't just pick one or two consensus servers, because then an attacker can stop you getting updates by compromising any one of them. You will have to do something like choose a lot of servers, and only require 51% to agree.
>...
>The problem is now you've either just replaced build servers with killswitches, or just replicated the same potentially-compromised buildserver.
I really don't understand this argument; compromised infrastructure probably shouldn't be a regular occurrence, and even if so, automated killswitches seem like the vastly more preferable option, no?
> I really don't understand this argument; compromised infrastructure probably shouldn't be a regular occurrence, and even if so, automated killswitches seem like the vastly more preferable option, no?
I'm pointing out how complex implementing reproducible builds is. It introduces a bunch of really hard unsolved problems that people are very handwavy about.
Who will do the reproducing? You say that users won't be able to do it. That makes sense, because if they could, then reproducible builds would be useless! However, you also say they will be able to check if a build server is compromised at any time. In order for both of those claims to be true we will have to design and build a complex consensus system operated by mutually untrusted volunteers. That's really hard, and seems like it provides a pretty negligible benefit.
IIUC Reproducible Builds guarantees that source is turned into an artifact in a consistent and unchanging way. So as long as the source doesn't change neither will the build.
If you're saying "reproducible builds are reproducible", then that is obviously true, but the question is what is the benefit?
Some people claim that the benefit is that there will be less incentive to threaten developers with violence, and I'm saying that's nonsense. If you cut through the nonsense, there are some modest claims that are true, but doing reproducible builds properly is very complicated and the benefit is negligible.
> If the server operators rebel, they can effectively veto a change the vendor wants to make.
How often do you think there will be a change so controversial that teams who have volunteered to secure the update system will start effectively carrying out a Denial of Service attack against all the users of that distro?
We also have to imagine that these malicious attestation nodes can easily be ignored by users just updating a config file, so the only thing the node operators could achieve by boycotting the attestation process is temporarily inconveniencing people who used to rely on them (which is not a great return on investment for the reputation they burn in doing this).
I don't know what reputation damage will happen, they're just third parties compiling code. There is no reputational damage for operating a malicious tor exit relay, why would this be different?
As I understand it, Tor does have a way of detecting whether an exit node is failing to connect users to their intended destination. (With TLS enforced, the only thing a malicious exit node could do is prevent valid connections).
In any case, I don't think anyone is proposing that the attestation nodes be run by random anonymous people on the internet. It would make more sense to have half a dozen or so teams running these nodes, with each team being known and trusted by the distro in question.
I'm not sure what the costs/requirements would be for running one of these nodes, but it might be possible for distros to each run a node dedicated to building each other's distros (or at least the packages that are pushed as security updates to stable releases).
Alternatively, individual developers that already work on a distro can offer to build packages on their own machines and contribute signed hashes to a log maintained by the distro itself.
This means everyone building NixOS will get the exact same binary, meaning you can now trust any source for it because you can verify the hash.
It’s a huge win compared to the current default distribution model of “just trust these 30 american entities that the software does what they say it does”.
This smaller bootstrap seed thing is a different problem from reproducible builds. nixpkgs does still have a pretty big initial TCB (aka. stage0) compared to Guix. But as far as I can tell NixOS has the upper hand in terms of how much can be built reproducibly (aka. the output hash matches across separate builds).
There's an issue for this[0]. Currently Nixpkgs relies on a 130 MB (!) uncompressed tarball, which is pretty big compared to Guix. It would be amazing to get it down to something like less than 1 KB with live-bootstrap.
Also, due to the way Nixpkgs is architectured, it also lets us experiment with more unusual ideas like a uutils-based stdenv[1] instead of GNU coreutils.
Bootstrapping from a very small binary core (I think 512 bytes) with an initial C compiler written in Scheme also has the advantage that the system can easily be ported to different hardware. Which is one major strength of the GNU projects and tools.
Not necessarily. Usually these very small cores end up being more architecture specific binaries than a stage0 consisting of gcc + some other core packages. A good illustration of this is that Guix's work on bootstrap seed reduction has been so far mostly applied to i686/amd64 and not even other architectures they support (at least, not fully).
Does this still matter if you can work your way up to a cross compiler, though? Do you actually need to go all the way down to ‘native’ hex monitors for a bunch of architectures or whatever?
For some reason, many compilers and build scripts have traditionally been written in a way that's not referentially transparent (a pure function from input to output). Unnecessary information like the time of the build, absolute path names of sources and intermediate files, usernames and hostnames often would find their way into build outputs. Compiling the same source on different machines or at different times would yield different results.
Reproducible builds avoid all this and always produce the same outputs given the same inputs. There's no good reason (that I can think of) why this shouldn't have been the case all along, but for a long time I guess it just wasn't seen as a priority.
The benefit of reproducible builds is that it's possible to verify that a distributed binary was definitely compiled from known source files and hasn't been tampered with, because you can recompile the program yourself and check that the result matches the binary distribution.
> There's no good reason (that I can think of) why this shouldn't have been the case all along
Well, it's not like developers consciously thought "How can I make my build process as non-deterministic as possible?", it's just that by the time people started to become aware of the benefits of reproducibility, various forms of non-determinism had already crept in.
For example, someone writing an archiving tool would be completely right to think it is a useful feature to store the creation date of the archive in the archive's metadata. The idea that a user might want to force this value to instead be some fixed constant would only occur to someone later when they noticed that their packages were non-reproducible because of this.
But you're right; if the goal had been thought of from the start, there's no reason why every build tool wouldn't have supported this.
> The benefit of reproducible builds is that it's possible to verify that a distributed binary was definitely compiled from known source files and hasn't been tampered with, because you can recompile the program yourself and check that the result matches the binary distribution.
It's not just security. If a hash of the input sources maps directly to a hash of the output binaries, then you can automatically cache build artefacts by hash tag and get huge speedups when compiling stuff from scratch.
This was the primary motivation for Nix, since Nix does a whole lot of building from scratch and caching.
>- Nix tooling was created 15 years ago exactly for this, Nix is mad to make packages bit-to-bit rebuildable from scratch.
I don't think this is accurate?
Nix is about reproducing system behaviour, largely by capturing the dependency graph and replaying the build. But this doesn't entail bit-for-bit identical binaries. It's very much sits in the same group such as Docker and similar technologies. This is also how I read the original thesis from Eelco[0].
And well, claims like this always rubs me the wrong way since nixos only really started using the word "reproducible builds" after Debian started their efforts in 2015-2016[1], and started their reproducible builds effort later. It also muddies the language since people are now talking about "reproducible builds" in terms of system behavior as well as bit-for-bit identical builds. The result has been that people talk about "verifiable builds" instead.
> There's no good reason (that I can think of) why this shouldn't have been the case all along
Determinism can decrease performance dramatically. Like concatenating items (say, object files into a library) in order is clearly more expensive in both time & space than processing them out of order. One requires you to store everything in memory and then sort them before you start doing any work, whereas the other one lets you do your work in a streaming fashion. Enforcing determinism can turn an O(1)-space/O(n)-time algorithm into an O(n)-space/O(n log n)-time one, increasing latency and decreasing throughput. You wouldn't take a performance hit like that without a good reason to justify it.
Being bit-for-bit reproduceable means you could do fun things like distribute packages as just sources and a big blob of signatures, and you can still run only signed binaries.
The GCC developers in particular were hostile to such efforts for a long time, IIRC. (This is a non-trivial issue because randomized data structures exist and can be a good idea to use: treaps, universal hashes, etc. I’d guess it also pays for compiler heuristics to be randomized sometimes. Incremental compilation is much harder to achieve when you require bit-for-bit identical output. Even just stripping your compile paths from debug info is not entirely straightforward.)
the security benefit of things like stack canaries rest on them being random and not known beforehand, I guess. Otherwise stack smashing malware could know to avoid them.
Wait, how is that relevant? Nothing says stack canaries have to use the same RNG as the main program, let alone the same seed, and there are cases such as this one where they probably shouldn’t, so it makes sense to separate them.
> Presumably, incremental compilation is only for development. For release, you would do a clean build, which would be reproducible.
I’d say that’s exactly the wrong approach: given how hard incremental anything is, it would make sense to insist on bit-exact output and then fuzz the everliving crap out of it until bit-exactness was reached. (The GCC maintainers do not agree.) But yes, you could do that. It’s not impossible to do reproducible builds with GCC 4.7 or whatever, it’s just intensely unpleasant, especially as a distro maintainer faced with yet another bespoke build system. (Saying that with all the self-awareness of a person making their own build system.)
> Just use the same paths.
I mean, sure, but then you have to build and debug in a chroot and waste half a day of your life figuring out how to do that and just generally feel stupid. And your debug info is still useless to anybody not using the exact same setup. Can’t we just embed relative paths instead, or even arbitrary prefixes it the code is coming from more than one place? In recent GCC versions we can, just chuck the right incantation into CPPFLAGS and you’re golden.
All of this is not really difficult except insofar as getting a large and complicated program to do anything is difficult. (Stares in the direction of the 17-year-old Firefox bug for XDG basedir support.) That’s why I said it wasn’t a GCC problem so much as a maintainer attitude problem.
GCC used to attempt certain optimizations (or more generally, choose different code-generation strategies) only if there was plenty of memory available. We discovered this in the course of designing Google's internal build system, which prizes reproducibility.
The code has to be changed so that things like system specific paths, time of compilation, hardware, etc. Don’t cause the compiled program to be unique to that computer (meaning compiling the same code on a different computer will give you a file that still works but has a different md5 hash)
By being able to reproduce the file completely, down to identical md5 hashes, you know you have the same file the creator has, and know with certainty that the file has not been tampered with
The software doesn't suddenly become incompatible with CPU-specific optimisations (or many other compiler flags that change its output), but if you do so, you won't be able to reproduce the distribution binaries. Distributions don't enable CPU-specific optimisations anyway, since they want to be usable on more than one CPU model.
No, just that you need to avoid naively conflating the machine that is doing the compilation with the one that optimization is being performed for.
Concretely, you would need to keep track of and reproduce e.g. the march flag value as a part of your build input. If you wanted to optimize for multiple architectures, that would mean separate builds or a larger binary with function multi-versioning.
Nixpkgs contains the build / patch instructions for any packages in NixOS.
If you want to compile any piece of software available in Nixpkgs, you can override it's attributes (inputs used to build it).
One can trivially have an almost identical operation system to your colleagues install, but override just one package to enable optimisations for a certain cpu. This would however imply that you'd lose the transparent binary cache that you could otherwise use.
Exactly this method is used to configure the entire operating install! Your OS install is just another package that has some custom inputs set.
Likely it means that with the same input arguments the end result is bit-by-bit identical. (As I understand the problems were hard to control output elements. So it was not enough to se the same args, set the same time, and use the same path and filesystem, because there were things that happened at different speeds, so they ended up happening at relative different elapsed times, so the outputs contained different timestamps, etc.)
There's a lot of problems with reproducible builds. Filesystem paths, timestamps, deterministic build order to say the least. This is a pretty great achievement and I'm looking forward to a non-minimal stable ISO.
I really liked how easy it was to create a custom ISO when I installed Nix. For once I had Dvorak as the default keyboard from the outset, neovim for editing, and the proprietary WiFi drivers I needed all from a minimal config file and `nix build`.
Some of the issues were really difficult to tackle, like the linux kernel generating random hashes.
The last mile was done by removing the use of ruby (which uses some random tmp directories) from the final image. Asciidoctor (ruby) was replaced with asciidoc (python).
Debian has been a major driver in making many pieces of software reproducible across every distribution; that Debian maintainers so often submit patches upstream and work directly to solve these issues is a big reason for this.
In other words: the work Debian has done absolutely set the stage for this to happen, and it would have taken much longer without them.
In general, Debian aims to upstream the changes they make to software. That allows all other distributions, including Nix, to profit from their work making software reproducible.
The trick with nvidia on Linux is to not expect that they will ever work on anything. If you want to be sure that stuff works, either don't buy Nvidia or use Windows.
I'm not familiar with the market the Jetson is in and what purposes it serves. From a quick Google, it seems to build boards for machine learning? If that's true, I'm pretty sure Google and Intel have products in that space, and I'm sure there's other brands I don't know of.
If Nvidia has its own distribution, it might well work for as long as it's willing to maintain the software because then they can tune their open source stuff to make it work with their proprietary drivers, the same way Apple is hiding their tensorflow code. I still would be hesitant to rely on Nvidia in that case given their history.
Google and Intel's solutions are just as proprietary, with the downside almost nobody uses them so bugs, performance, supported tooling, community, and support windows are often much worse. It's not even clear their solutions actually offer better performance in general, given this. (And if you think proprietary Nvidia software packages are infuriating messes, wait until you try Intel proprietary software.) How you feel about their history of Linux support all that said is basically irrelevant, and they'll continue to dominate because of it.
The ability to recreate a binary image from the same set of source files and getting that binary to be identical to the package-provided binary.
This is useful form of ensuring that nothing is amiss during compile/link time.
Today’s GNU toolchain clutters the interior of binary files with random hash values, full file path (that you couldn’t recreate … easily), and random tmpfile directories.
The idea is to make it easier to verify a binary, compare it with earlier-built-but-same-source binary, or to be able to reverse engineering it (and catch unexpected changes in code).
Recently, President Biden put out an executive order that mandates that NIST et al work out, over the next year, an SBOM/supply chain mandate for software used by Federal departments.
That's going to require the equivalent of "chain of custody" attestations along the entire build chain.
Along with SOC and PCI/DSS and other standards, this is going to require companies and developers to adopt NixOS type immutable environments.
Unfortunately, I don't think this is going to be the outcome. We're more likely to end up with "Here is the list of filenames, subcomponents, and associated hashes" as opposed to requiring NixOS style environments. Vendors to the subcontractors will likely be required to provide the same list of filename/subcomponent/hashes, a far cry from repeatable builds.
I really want to adopt Nix and NixOS for my systems but the cost of wrapping packages is just a little too high for me right now (or perhaps I'm out of date and a new cool tool that does it automatically is out). IMHO, a dependency graph-based build system that builds a hermetically sealed transitive closure of an app's dependencies that can be plopped into a rootfs via Nix [0] is far superior security wise to the traditional practice of writing docker files.
Hm, this seems like a lower level set of tools that can be composed into something a bit more user-friendly (one of my personal complaints with Nix as well, despite being a big fan of the concept and overall execution. Nothing too steep that can't be learned eventually, but the curve exists). I'm wondering if there would be an audience for a higher level abstraction on top of Nix, or if one already exists.
Who remembers Ken Thompson's "Reflections on Trusting Trust"?
The norm today is auto-updating, pre-built software.
This places a ton of trust in the publisher. Even for open-source, well-vetted software, we all collectively cross our fingers and hope that whoever is building these binaries and running the servers that disseminate them, is honest and good at security.
So far this has mostly worked out due to altruism (for open source maintainers) and self interest (companies do not want to attack their own users). But the failure modes are very serious.
I predict that everyone's imagination on this topic will expand once there's a big enough incident in the news. Say some package manager gets compromised, nobody finds out, and 6mo later every computer on earth running `postgres:latest` from docker hub gets ransomwared.
There are only two ways around this:
- Build from source. This will always be a deeply niche thing to do. It's slow, inconvenient, and inaccessible except to nerds.
- Reproducible builds.
Reproducible builds are way more important than is currently widely appreciated.
I'm grateful to the nixos team for being beating a trail thru the jungle here. Retrofitting reproducibility onto a big software project that grew without it, is hard work.