"eBPF Documentary: An exciting train wreck in progress"
That would be a better title. eBPF started as a small extension to just be able to insert small trivial hooks. It's now basically a hacked-up broken WebAssembly clone, with zero forethought put into it. NIH syndrome at its worst.
It has recently grown unlimited loops with runtime metering, making the static verifier basically a worthless complexity. Before that, it had acquired exceptions and stack unwinding.
Bytecode based execution goes back to the 1960's, yet another thing that WebAssembly advocates tend to forget when writing comments and blog posts about its greatness.
Bytecode interpreters/JITs are indeed ancient. However, JITs designed to safely confine malicious bytecode are most definitely not. Java or .NET VMs are probably the earliest examples, and the WASM infrastructure is using experience gained there.
My problem with eBPF is that it just ignored all of it, and proceeded to feature-creep an initially "simple" solution into a monster. The initial justification for the NIH stuff was: "but we need those static guarantees provided by the verifier, nothing else can do that". But of course, these justifications were forgotten as soon as the verifier became the limiting factor.
The lessons being learnt are more on the marketing side, there are already some USENIX papers about how secure WebAssembly actually is in practice, as it slowly becomes an interesting target.
And by the way, Chrome recently had a CVE related to WASM, because as usual there are the paper definitions in theory, and then the actual runtimes.
Architecture independent eBPF "rescue binaries" could be an interesting part of a distribution toolset.. or attack rootkit. It's hard to have nice things.
The genius of ebpf is allowing for pluggable policy in a world where the kernel API is very slow to change and can’t meet everyone’s needs. Whether it’s how the kernel handles packets off the wire, how it controls traffic, scheduling entities, or instrumentation, ebpf lets you provide logic rather than turn a bunch of knobs or use a bespoke syscall that only handles one case. It also moves the processing logic to the data in the kernel rather than having the kernel have to do expensive copies to and from userspace.
ebpf isn’t really novel beyond the interfaces it provides. They are just kernel modules that have been vetted and are sandboxed. Inserting executable code has been part of the kernel since forever in module form and kprobes.
> ebpf isn’t really novel beyond the interfaces it provides. They are just kernel modules that have been vetted and are sandboxed. Inserting executable code has been part of the kernel since forever in module form and kprobes.
This should be sung from the mountaintops. This concisely summarizes nearly everything that uninformed reader should take away from the comment section.
bpf tooling generally provides no stability guarantees when you interact with kernel primitives. See [0], for example. Tho things have improved somewhat with CO-RE
I'm curious what this guarantee includes - the bytecode? Because the actual in-kernel eBPF API is famously unstable, with eBPF-based applications usually requiring a cutting-edge kernel version (for industry anyway). And of course the eBPF programs themselves rely on accessing structures for which no stability guarantees are made whatsoever.
I've been hearing more and more about eBPF, especially here on HN.
I haven't yet watched the documentary so perhaps it is answered there. But, the analogy of JavaScript inside the kernel is great and I'm left wondering: what was the way to do it previously? Userland network tool? This standardizes on a interface to the kernel, not a language, right? Feels off to say it is JavaScript because that comes with a lot of baggage, but also (as a versatile and ubiquitous language) incredibly powerful and useful tool. Is that intentional by the author?
> But, the analogy of JavaScript inside the kernel is great and I'm left wondering: what was the way to do it previously? Userland network tool? This standardizes on a interface to the kernel, not a language, right?
Guess/sketch: It's a language in most senses. Previously the kernel had APIs for packet filtering rules for iptables etc., but the set of rules you could use was somewhat "static" - rules would have parameters, so you could do things like if the source IP is in this range then rewrite it as this and direct it to this interface, but it was kind of like one of those visual flowchart languages where you can drag and drop the available boxes in a given order, but if there isn't a box to do what you want then you're stuck. Whereas with eBPF it really is scriptable - rather than a specific rule type you can just submit the script you want it to run - and nowadays it's become kind of a general kernel scripting language rather than just for networking.
I'd draw a parallel with how 3D graphics programming has shifted from "you can do these kinds of transformations, submit a list of what you want to run in what order" to "this is our shader programming language, just write whatever you want to do as a program in this language".
iptables is definitely limited in comparison to eBPF, but that isn't the innovative step. BPF was around for more than twenty years before eBPF came around. Around 2013 I worked on a packet analysis pipeline that generated BPF code dynamically at runtime. eBPF isn't more scriptable than BPF in this sense. The language does add some opcodes and loops that weren't available in the original, but this is relatively modest.
The real genius of what these folks did was extending the usefulness of BPF beyond the network stack. Without a provably safe language it would've been impossible to enable flexible kernel tracing.
Out of interest - it's a while since I've done infra work - is iptables still around? Is it rewritten to be based on eBPF now? Do people just make .bt or .py files that call eBPF instead to implement packet filtering?
> This standardizes on a interface to the kernel, not a language, right?
It's a VM that runs JIT-compiled eBPF programs. You can write code in C or Golang or other languages that compiles down to eBPF. I did a video looking at the kernel eBPF code here: https://youtu.be/hznUH_zP77U?t=1165
> wondering: what was the way to do it previously? Userland network tool?
My understanding is that userland network tools were common in fields like finance that needed fast custom networking and wanted to eliminate the overhead of context switching. I don't know how common they are/were in other fields though.
Oh, no I don't mean that arbitrary Go compiles to eBPF. Apologies if I gave that impression. I meant that there are libraries that let you compose eBPF programs in other languages. But you're still putting together an eBPF program, just like you can assemble JSON with Go but you can't compile an arbitrary Go program to JSON.
BPF works more like Java, where you have your high-level source code, then a JIT compiler translates it to bytecode that runs on a VM. Even BPF “programs” written in C aren’t using a standard C compiler. The BPF JIT compiler translates them to bytecode.
The subset of C you can use is fairly limited, only what their compiler supports. So what people have done is written compilers that can take Go or Python and compile it to BPF bytecode.
Ad-hoc kernel extensions were a pretty common answer, and one thing a lot of people love about eBPF is that it subsumes most of the reasons people wrote lkms commercially.
One of the big wins is not so much “build and run your own stuff” but there are very nice low-cost (in terms of compute) performance utilities built on eBPF
There are so many utilities in that list; there’s a diagram midway down the readme which tries to help show their uses. bcc-tools should be available in any distro.
Also, Brendan Gregg does a ton of performance stuff that is worth knowing about if you check out his other work. Not eBPF only. Flame graphs are useful.
I was a little disappointed DTrace[0] was not mentioned at all. The instrumentation (not the SDN) isn’t novel, not even to Linux (DTrace is available on Linux - I understand licensing is at least questionable (for some distros), but that aside…). [1]
DTrace was not the inspiration for eBPF at all, so it's not obvious it is relevant to mention. As the documentary mentions, the initial impetus for eBPF, was software defined networking.
eBPF is a much bigger and more comprehensive infrastructure piece (networking, tracing, security, etc) than DTrace. And thanks to licensing issues, even within the limited domain of tracing, DTrace will likely become a footnote in history, while eBPF becomes available on every major OS platform.
There aren't "licensing issues" with DTrace -- you are merely referring to the fact that it is licensed under the MPL-derived CDDL and not the GPL. But it is definitely true that they are not seeking to solve the same problems! Safety is very core to DTrace[0]; the difference here is entirely deliberate.
AFAIU, it goes beyond safety; eBPF is designed to run entirely in the kernel rather than requiring a user-space bridge. So it's able to operate at line speeds in networking scenarios etc.
> you are merely referring to the fact that it is licensed under the MPL-derived CDDL and not the GPL.
Not really. I'm referring to the fact the eBPF will be available across Microsoft, Apple, and Linux, and there is no other technology that will be able to offer that.
> not seeking to solve the same problems!
Exactly. eBPF has a much broader and more significant problem set, not just tracing, but also security modules, software defined networking, and an almost unlimited future potential as well.
- DTrace does execute entirely in the kernel. Indeed, anonymous tracing is instrumenting the system without any corresponding user process whatsoever
- DTrace exists on quite a few systems, including (with caveats) Windows, MacOS and Linux.
- The unbounded nature of eBPF very much runs contrary to the safety that DTrace assures; DTrace isn't seeking to augment the system, but merely to understand it
> I'm referring to the fact the eBPF will be available across Microsoft, Apple, and Linux, and there is no other technology that will be able to offer that.
I don’t know if there’s some qualification I’m missing in your statement, but does this not count?
While you're correct to say that DTrace is available on Linux, it is so restricted as to be much less capable than eBPF. It is the reason it will become a footnote, and eBPF will become ubiquitous.
It may become a footnote on Linux, but Linux isn't the only system out there -- and DTrace remains alive and well in many systems (not least in its reference implementation in illumos[0]).
You're discounting the network effects. Also, eBPF provides many more capabilities beyond dTrace, and it will be ubiquitous across all OS, without having to make the exception for Linux. Anyone targeting full cross-platform capabilities will be better served by eBPF. The unfortunate history that crippled dTrace on Linux, will lead to its ultimate sidelining.
I don't really know what you mean by network effects, but perhaps we're just talking about two different things: you are looking at eBPF as a substrate to deliver arbitrary software (?!) whereas I view DTrace exclusively as a diagnostic tool. As to their relative capabilities: its other potential advantages aside, eBPF (and the tooling built upon it like bcc) lacks much of the functionality and polish of DTrace when attempting to accurately instrument the system. Its lack of robustness makes even basic instrumentation challenging,[0] let alone the richer (and admittedly, more esoteric) features of DTrace that it's missing entirely.
You can argue about the relative beauty and polish of the tech, but just as Git won against more polished alternatives, so too will eBPF displace others because of its platform reach.
eBPF combines amazing tech, with a platform reach that is impossible for dTrace to provide. It's this unique combination that will capture the mindshare necessary to relegate others to the sidelines.
i feel like the fact that the page mentions brendan gregg is a pretty strong reference to dtrace already. actually it doesn't just mention him, it's completely written by him
I think people that know, know he was involved (in the userland aspect) of DTrace, but I get a sense for some reason there’s no love lost between Brendan and “DTrace”.
AFAICT DTrace is more a parallel branch. Both inspired by cBPF.
What should get more talk IMO, is the exokernel XOK's kernel VMs which went way harder than even eBPF does towards user space programmability for the kernel as a core primitive.
For instance instead of sleep(2) or futex(2) calls, XOK exposed "wake programs" that user space would register for the scheduler to run to answer "is this blocked thread runnable again". Would have solved the collabra's need to change futex(2) to work more nicely with wine/windows primitives in a more general way.
eBPF modules can be closed-source, right? I can see a future where things like ZFS are paid eBPF modules, or games ship with proprietary anti-cheat eBPF modules.
I saw the airing of it at Kubecon -- also met the Finnish guy there in the video. I mistook him for another Finnish guy with blond hair. Didn't meet the Russian though.
That's a cool conjecture but it kind of misses the point of eBPF.
The point of it is that you can run user-defined programs while avoiding the costly context switch between user space and kernel space.
The kernel already is the kernel. Compiling kernel code to eBPF programs would offer seemingly no performance gains, since you're already in kernel space; there is no costly context switch to avoid.
ebpf seems to be a very interesting idea and have been experimenting with it. Still I find it weird that we're doing documentaries on software "frameworks"
I mean, I love eBPF more than most, but this is a practical engineering solution to a logistical problem that didn't really need to exist in the first place.
This is not genius and not an order-of-magnitude improvement to an important computer science problem; it's an improvement to a costly artifact of the Linux kernel.
As someone who works with BPF every day, I don’t think comparing orders of magnitude is right. It just enables things that can be done now that you wouldn’t dream of doing before. I think a good analogy is going from horse drawn carriages to the automobile. In a carriage going from Boston to New York would be a multi day journey that you would have to plan for weeks in advance. With a car you just get in and go. BPF is like that, suddenly you have all these options for things that you can do instantly, that you’d only do before if you were ready to dedicate years to getting support into the kernel. It’s a superpower for kernel development.
It has always seemed quite obvious to me that dealing in machine code is a flawed approach for distributing software. At the most basic level, it entails giving someone else near unfettered access to the hardware of your computer and simply hoping that they do nothing malicious or malformed. Yet the software world as a whole seems continually shocked at the idea of using anything else. Perhaps someday we will learn this lesson in its entirety and begin to share code rather than blobs.
I don't think machine code is the problem? Running a native binary doesn't give it access to anything inherently; the OS gives access through syscalls, and can impose restrictions - and indeed, does; it's not like running a binary on Linux automatically gives it access to anything under /dev.
Also, it's not that the whole software world thinks blob-only software is normal; I'm typing this on a nice comfy GNU/Linux box where the only blobs are some firmware. (Edit: And to be quite clear, a good chunk of this community would really like to get rid of those blobs too, it's just that we don't have a fix at this point.)
> Running a native binary doesn't give it access to anything inherently
While this is true from a certain perspective, machine code creates a system which must grand access to many things to become usable. A shared file system is a good example of this. Some software could easily echo a line into you .profile that tries to launch a key-logger, and this works in many cases. The expectation of software existing as opaque files creates a huge amount of work for the OS in verifying the exact behaviour of the software as it runs (and in ways which can often be circumvented), rather than a source-based approach in which malware is never allowed to touch the processor.
> I'm typing this on a nice comfy GNU/Linux box where the only blobs are some firmware
So you suffer the worst of both worlds then. You've had to download and compile the source yourself, but as the software is designed around being distributed as blobs, so you enjoy none of the benefits that might come from source distribution.
Machine code does not require granting anything. The presence of a shared filesystem is a artifact of the OS, not any sort of inherent requirement.
The actual effect of machine code being opaque is actually "beneficial" to security as only very coarse protection boundaries are even expressible. You must coarsely separate your code to get protection which comes at a potential cost of performance. The advantage of non-machine code is that you can express very fine protection boundaries which reduces the performance cost of protection, but at the cost of more complex analysis.
Ensuring protection boundaries in the presence of coarsely separated code is strictly easier. You have strictly less ways to escape. Any coarsely separated code can be trivially transplanted onto a system enabling finer protection since they can just not use the finer protection. By the same logic, any protection boundary that can not protect coarsely separated code has no hope of enforcing protection on finer separations.
That the kernel designers of Linux, Windows, iOS, etc. can not even ensure protection in general when the systems are coarsely separated means they certainly can not be trusted to ensure protection in general when things are more finely separated. If they could, fixing their existing coarse protection boundary would be a trivial consequence. About the only theoretical reason why this might not apply to eBPF is that it is not "general" in that many programs are not expressible, but that is not sound, positive evidence of suitability where as the historical track record of protection in all commercial operating systems is abysmal to say the least. Claims of finally achieving security (on a strictly harder variant of the problem) are extraordinary claims and extraordinary claims demand extraordinary evidence.
It requires that you actually run software on your CPU before you can tell if it's malicious or not, by which point it is often too late to do anything about it. Almost all malware that exists does so thanks to this fact.
Machine code running in unprivileged mode with no shared memory and where syscalls auto-fail can do nothing malicious. Such a system is even more locked down than a Java program (when it does not also rely on hardware isolation) with no access to the filesystem class.
To break the JVM protection boundary and access the filesystem only requires finding a JVM runtime breach or a vulnerable library to access the otherwise inaccessible filesystem class.
To break the hardware isolation requires finding a hardware bug in the protection boundary which is vastly less common to the point where I am not even sure there has even been a hardware cross-privilege data integrity bug in over a decade.
The examples you keep providing are not apples-to-apples as you keep comparing locked down source level programs against ensembles of machine code programs plus unrestricted access to a insecure operating system runtime. Every error you highlight in the machine-code system is actually a operating system/runtime problem, not a problem with machine code as a software delivery mechanism.
> To break the JVM protection boundary and access the filesystem only requires finding a JVM runtime breach or a vulnerable library to access the otherwise inaccessible filesystem class.
CPUs also have bugs in them, and those are much harder to patch. Practically speaking, I think it is more likely for a CPU bug to be found than for a Java program to do what you describe here. The prospect of a vulnerable library seems the most likely avenue of attack, but under my proposed system, not even libraries would have access the the file-system.
> Machine code running in unprivileged mode with no shared memory and where syscalls auto-fail can do nothing malicious
I'm fairly sure it can do nothing, so while that technically satisfies the objective, I don't think it really matches the spirit of the assignment. Then if you want to re-introduce the file-system to such a process, your options are much more complex than they would be in the Java case. You can write some Java code that doesn't interact with the file system at all, then pass objects representing files (i.e. streams) into it. The result is absolute fine-grained control over what the process can access because it is not relying on any kind of shared state. All the information it uses is passed into it. This is much easier than attempting to containerise an existing executable, which requires support from the operating system (and also technically support from the hardware).
My core point here is that the software solution to security simply works better than a hardware one. It seems to me that pure functions provide the ultimate form of containerisation, and do so in an immediately intuitive way.
CPU bugs are extremely rare compared to software bugs. Like, millions to billions of times less likely. CPU bugs are news, software bugs are Tuesday. You might as well argue that a steel box protects its contents no better than a cardboard box because who knows, maybe a meteor will land on the box so steel is no better than cardboard. You can not just dismiss factors of millions on feelings.
Again, you are talking about machine code plus OS ensembles. The machine code can not access a filesystem or files unless the OS allows it just like your Java example. You can pass precisely a file or stream to your program if your OS supports such a notion. Even Linux, as insecure as it is, can express your notion of streams with pipes to input files. The insecurity of machine code on Linux is an artifact of OS design providing things like an innate right for processes to access a file system, not anything to do with the machine code.
Containerisation does not provide a security boundary and is difficult because the OS design and the programs targeting that OS API are not designed to operate in that way. Such a program would not operate correctly in your Java model either because the program only functions when there is a global filesystem conforming to the implicit requirements of the programs. As soon as you modify that program to operate correctly according to your model it then becomes trivial to precisely provide the same dependencies to the machine code variant as long as your OS supports such abstractions.
Every single abstraction you are describing can be expressed as a OS concept at the cost of performance. An argument can be made that the performance impacts are too high, so commercial designs sacrifice protection and that by supporting language-level protections you recover enough performance to enable secure designs. But, in that case you would expect proof-of-concepts demonstrating feasibility and proven protection in the equivalent, but non-performant model, and only then optimizing the performance by leveraging language-level protections. Until that is done there is precisely zero reason to believe any language-level protections are robust.
To say it again, language-level protections are strictly more powerful than OS/machine code level protections. If you can prove them then you can trivially prove any OS level protection by jus allowing less fine separation. If you can not demonstrate weaker results like a unhackable OS design, then you do not have a robust language-level solution. It is like proving P != NP without proving something easier like P != PSPACE.
Language level solutions would be awesome, but they are not a stepping stone to protection, they are the end result once you have already mastered security.
> can express your notion of streams with pipes to input files
Not if you've disabled syscalls. As it turns out, process isolation is incredibly complex. Millions of lines of code between in kernels and container systems are dedicated to this issue that could be trivially solved with a software based approach, if only files were distributed in a source format.
> The insecurity of machine code on Linux is an artifact of OS design providing things like an innate right for processes to access a file system, not anything to do with the machine code
This design decision is intimately related to the format in which code is distributed, the linking and loading of ELF files, as well as the compilation of software. Otherwise containerisation would be as simple as giving each process a private file-system.
> As soon as you modify that program to operate correctly according to your model it then becomes trivial to precisely provide the same dependencies to the machine code variant as long as your OS supports such abstractions.
Okay, so you take some information from the source and add it to the machine code to solve the problem. Let's say another problem comes up and you do that again. The limit of the process is source-based distribution, and maybe some kind of cached IR (similar to what Java does). The more information you add to machine code, the less it resembles machine code. But when you modify the machine code format, you have to modify your operating system, which is wasteful when this problem has been effectively solved in user-space for decades. The argument against this is always "we can modify the OS to achieve the same thing as you could without the OS", which is always true but not exactly useful. Why is it better to implement these things in a complicated way in the operating system? Modern computers are fast enough to JIT compile Java software, so I see no reason to still use binary formats which require all this extra effort.
> CPU bugs are extremely rare compared to software bugs
You should be comparing to the kind of software bug you are mentioning here. Specifically one where a bit of basically interpreted code that can't even read or write files somehow gains total control over the process its running in. Preventing this is so simple that it just isn't logical for such a bug to exist. You might think it likely because of the frequency of bugs in operating systems that use binary programs, but that is because securing a binary program is much more complicated than securing one which is interpreted.
> But, in that case you would expect proof-of-concepts demonstrating feasibility and proven protection in the equivalent, but non-performant model, and only then optimizing the performance by leveraging language-level protections. Until that is done there is precisely zero reason to believe any language-level protections are robust.
Well using Haskell as an easy example here, the language is fairly performant, and you know all functions without side-effects are safe. By redefining the IO monad, you can then at a software level control every file access that the program attempts and guarantee that it does nothing malicious at minimal overhead. Obviously if I were implementing a practical version of this, I wouldn't chose Haskell for it, but the point still stands that Haskell proves the feasibility of such a system.
What else do you suggest then? Is there even hardware available that implements a high-level VM programming language as an abstraction layer (instead of e.g. the x86 instruction set)?
An program sits on top of hardware and can act as a go-between. Think of the web browser, or the JVM. Both insanely popular by virtue of providing the basic features I describe, and also evidence that they needn't even be integrated to the operating system. Something of this nature that also provides the basics of an interactive desktop.
> The expectation of software existing as opaque files creates a huge amount of work for the OS in verifying the exact behaviour of the software as it runs
That’s not really how the software/OS relationship works. By default OS’s run software in very unprivileged CPU security contexts, where they can’t really do anything beyond but operate on memory the OS allocated to the software, with CPU instructions can’t talk directly to other hardware, or read/write memory beyond the bounds the OS specified.
To do literally anything else, like open a file, write to a file, talk to a network interface, talk directly to other hardware connected to the CPU. The software needs to use a syscall, and effectively ask the OS to perform the operation on its behalf. Every single dangerous operation you can perform is guarded by the OS, the OS doesn’t validate anything, it simply decides to perform the operations requested, or it decides not to perform those operations.
The only reason why software expects so much unfettered access to a system is simply because OS have historical not limited what software was allowed to do, OS’s simply performed whatever operation was requested of them without question. It only in the last 20 years or so that security became a concern, and it occurred to us that allowing software to just do whatever it wants is probably a bad idea. But the genie is out of the bottle, and it’s proving hard to put it back in. So retroactive work to limit what software can do by default, without simply breaking everything, is slow and difficult.
If you ever want to understand why you can do somethings on MacOS, but not iOS, then this is a pretty good place to start (if we ignore Apple’s brand and commercial reasons for a moment). iOS only ever supported tightly sandboxed apps, that had to play nice with a draconian permissions system, where as MacOS is more standard OS, that was originally developed in the age before security was really a concern. So it’s easy to enforce tight sandboxing on iOS, where software has always had to deal with it, but hard on MacOS, where for the majority of MacOS history, sandboxing simply didn’t exist.
> rather than a source-based approach in which malware is never allowed to touch the processor.
That’s a nice idea, but doesn’t really hold up to scrutiny. You need to look at the ever increasing number of supply chain attacks to realise that simply being open source does little to ensure malware doesn’t make it onto your machine. And that before we get into issues like heart bleed, where the OSS software on your machine may contain bugs or errors that allow remote parties to gain access to privileged data or credentials, via OSS software that isn’t malware, it just got bugs in it.
At the end of the day, it doesn’t really matter what magical security boundaries you develop, someone will find a way around it. Which is why defence in depth is such an important principle. Simply relying on the idea that Open Source means no malware is foolish. You need proper OS defences regardless.
I know all that. The process you describe is the OS verifying the behaviour of software as it runs.
>That’s a nice idea, but doesn’t really hold up to scrutiny
It does actually. The kind of attack you describe only works because software exists in non-source based systems. If you were going to run a Haskell function, and you were told that I was going to modify its body in some way, there is no way I can insert malware into it because you cannot reach out of the context of the function and mess with the operating system. Software is composed of functions, which naturally run in a fully containerised state and only become dangerous when the OS adds an additional layer of computer-wide state.
> Software is composed of functions, which naturally run in a fully containerised state
That’s simply not true. It’s a convenient fiction that modern programming languages provide, but they only provide it by virtue of not providing API to access external state, and even that isn’t true. Most programming languages do let you access state beyond a functions stack, in the simplest form by allowing your function to interact with data stored on a heap.
There’s absolutely nothing about software that inherently results in “functions, which naturally run in a fully containerised state”. Functions are just a useful abstraction, in the same way an OS’s APIs are also just useful abstraction. There’s absolutely nothing that makes function scopes inherently more secure than OS APIs. Arguably function scopes are vastly less secure than OS APIs, there’s a good reason why stack overflows are one of the most common forms of Remote Code Execution exploits.
That is kind of my point. To claim that functions are somehow more secure than containers, or just standard process isolation (which is basically all container are), is just silly.
Functions are no more inherently secure than containers. Both abstraction are contained, until something exposes the external environment to them, both of them can be configured to simply not do that.
The claim that functions are more secure is quite reasonable, and it stems from the fact that they are simple. You or I could write a contained function, or even a piece of software that verifies a function is contained by scanning its source code. We could do this in perhaps a few weeks of spare time. Compare and contrast to the effort that goes into other containerisation techniques. Docker would take more that a week of spare time to write. It is orders or magnitude more complex.
> Compare and contrast to the effort that goes into other containerisation techniques. Docker would take more that a week of spare time to write. It is orders or magnitude more complex.
That’s because you’re comparing a butter knife to a chainsaw. You can quite easily run an entire process in manner that just as limited as simple function by using something like seccomp[1] which prevents the process from doing anything except exiting, or read/writing to file handles that have already been opened as pass to it.
No need to spend a weekend or whatever writing something to “verify a function is contained”, or anything like that. Just give your process to the OS and tell the OS to contain it.
Heck if you’re on a system with systemd, all you need a handle full of lines of config, and you’re done. No need to worry about how good your static analysis is (spoiler alert, it’s never good enough), or creating a new language, or reading anyone’s code.
You use containerisation frameworks when you want your software to be able to interact with the world, but want to be able to carefully analyse and limit how it interacts with the wider world. It’s total overkill if you just want to execute code that performs computations without interacting with other systems.
> You can quite easily run an entire process in manner that just as limited as simple function by using something like seccomp
Yet most software will not work when you do this because most software expects to be able to interact implicitly with the wider system. By using functions as the fundamental unit of containment, you force software to explicitly declare anything it depends on as an argument to the function. For instance, if it wishes to listen on a port, it would need to receive some object representing access to that port as an input rather than just receiving one large implicit object representing every function of an operating system which then has to be retroactively hacked down to only the required functionality. Full Docker-like containerisation on all processes, and done as simply as if you had used seccomp. It is the best of both worlds.
> spoiler alert, it’s never good enough
This is provably false. Haskell is an instance of static analysis good enough to verify that functions don't have side-effects (and additionally that they don't use mutable state). Safe Rust also has more practical methods, though it doesn't fully implement this. Verifying these things is trivial, and requires only three conditions: global variables are immutable, immutable objects cannot be converted to mutable ones, and that the mutable state of the operating system shall only be exposed through explicit mutable objects representing it.
> While this is true from a certain perspective, machine code creates a system which must grand access to many things to become usable. A shared file system is a good example of this. Some software could easily echo a line into you .profile that tries to launch a key-logger, and this works in many cases.
That's common, but it's certainly not a requirement to run native code. For example, we've done a pretty good job at retroactively fixing that while preserving backwards compatibility with containers (I can, and have run normal official Firefox binaries inside a docker container with zero access to my real home directory) or sandboxes like flatpak (bubblewrap). If you want to run real native binaries but don't have to preserve backwards compatibility, then it gets easy; genode ( https://genode.org/ ) does a lovely job of truly practicing only giving programs what access you want to give them.
> The expectation of software existing as opaque files creates a huge amount of work for the OS in verifying the exact behaviour of the software as it runs (and in ways which can often be circumvented), rather than a source-based approach in which malware is never allowed to touch the processor.
I think you're overoptimistic regarding what you can do with the source code short of manual (human) auditing. I mean, sure there are things you can scan for to try and catch bad behavior, but in the case of actual malice I wouldn't trust automatic code analysis to protect me.
>> I'm typing this on a nice comfy GNU/Linux box where the only blobs are some firmware
> So you suffer the worst of both worlds then. You've had to download and compile the source yourself, but as the software is designed around being distributed as blobs, so you enjoy none of the benefits that might come from source distribution.
I have no idea why you think either of those things? Depending on the distro I certainly can compile from source on my own box (ex. Gentoo, NixOS), but I can also use precompiled binaries (ex. Debian, NixOS) while still having it be trivial to go find the exact source that went in to the binary package I downloaded (this has gotten even stronger with Reproducibility efforts meaning that I can even verify the exact source and build config that created a specific binary). The actual application software and OS are available as Open Source code that can be audited, with binaries available as a convenience, and the only remaining blobs (unwelcome but impractical to fix so far) are firmware blobs with relatively constrained roles (and on machines with an IOMMU we can even enforce what access they have, which is a nice mitigation).
> I think you're overoptimistic regarding what you can do with the source code
Suppose you have a Java function. If you remove the file-system class at a system level, what can that code do? It's pretty easy to isolate it so that it cannot compile to any kind of malicious code, only conduct computations to produce values. That's why you can open a website running JavaScript without as much worry for the security of your computer as you would have downloading a blob and running it.
> I have no idea why you think either of those things?
Well most of the software on your computer has the following api:
int main(int argc, char **argv)
This is because it needs to run as an executable. Without this requirement, software might have much more interesting APIs which are much more useful. But instead there is a requirement that all software is mashed into blobs and has all of the useful information removed from it.
Additionally, just compiling the software from source does nothing to guarantee that it isn't malware. You can compile malware from source just as easily as anything else. The system I'm proposing does not allow for malicious programs to be compiled.
You know that a main function is not really the only way to do programs right? Like shell integrations and other such things are exactly what you talk about. Those are software with "interesting APIs". It just does not make a lot of sense for a foreground application or at least some application that needs control.
And besides if you have a look at Apple platforms and how they do communication with running applications, "interesting APIs" right there.
And finally there is no requirement at all that ELF binaries are stripped of useful info (symbols, debug info etc), there is a desire though to save disk space and make reverse engineering harder.
All of this has nothing to do with the executable code format (native vs byte code or what have you).
> You know that a main function is not really the only way to do programs right?
> there is no requirement at all that ELF binaries are stripped of useful info
Yes, you can add useful information to a program. This is what I'm advocating for. The issue is that most people don't do this, which is what I'm complaining about. Consider the difference between distributing executable files vs. Python libraries for instance.
> All of this has nothing to do with the executable code format
I don't see how you can write an entire comment about the format in which code is distributed and come to this conclusion.
The actual binary format will be the same regardless of extra info shared together with the actual code. That was my point, so it honestly does not matter for any sufficiently complicated product how it's shipped.
The problem is the incentives, people don't WANT to share that data with you.
And there are very good reasons to not ship an application as plain python, the biggest problem is all the dependencies that that requires. And the impact that has on the user.
The binary format is always the same, you need to get to machine code at some point. The difference is how you get there.
> it honestly does not matter for any sufficiently complicated product how it's shipped.
I think this opinion is in deep disagreement with the facts. Among the most popular and influential computing technologies, you have Java and WWW. Both are entirely based around shipping code in better format than binary.
"That's a really cool security aware script language you've got there! So.. um.. how can I extend it to call third party libraries?"
Perhaps the idea that "the computer" is one single entity with a shared security domain and view of hardware is the flaw. Why can my web browser read my tax documents unless I go through a bunch of rather absurd efforts to prevent something so simple?
Because you want to be able do report your taxes documents to the tax office? It's one of those things that sound so simple on paper, but every time someone does that trivial thing and not make documents available to the web browser, usability suffers.
The real answer why the browser can read certain files is much more complex, your web browser is not a singular entity anymore. And the network and protcol speaking parts of it can't access your documents, according to the principle of least authority.
It's far from perfect and gets hacked every time, but do take the time to read how that's done. The hacks are just as complex as the web browser itself. The practical problem with the browser is the enormous complexity of functions, everything from OpenGL to databases to p2p and usb, that keeps growing boundlessly.
Right.. which is why "home PC" is one of the largest attack surfaces we have. The fact that tax authorities expect you to brave the gap is only one of the problems with this configuration.
The web browser _purports_ to be that entity. The list of CVEs shows that it isn't. If I install a "web browser" I'm installing "a binary program that can access anything it wants at any time it wants."
"Do take the time to read" is an absurdly condescending thing to say while simultaneously moving the goalposts of the argument.
eBPF allows user-defined programs to run in the kernel.
This is huge for performance-sensitive code that executes against network packets: you don't have to context switch between kernel space and user space.
It's worth pointing out Solana's extreme competitive advantage over other chains is almost entirely due to it running on a variant of eBPF. †
This is an order-of-magnitude leap over other implementations and essentially the way you should do it, if you were to write it from scratch, aside from special purpose hardware fabrication.
† The second reason Solana is so fast is extreme parallelism: all accounts that are used in a transaction must be marked as either "read-only" or "writeable" before sending the transaction, allowing the runtime to parallelize all reads and only solve write contention when necessary.
That would be a better title. eBPF started as a small extension to just be able to insert small trivial hooks. It's now basically a hacked-up broken WebAssembly clone, with zero forethought put into it. NIH syndrome at its worst.
It has recently grown unlimited loops with runtime metering, making the static verifier basically a worthless complexity. Before that, it had acquired exceptions and stack unwinding.