What Makes System Calls Expensive: A Linux Internals Deep Dive

anonymousiam · 2025-09-20T16:50:25 1758387025

On a secure system (not serving to the Internet, and all trusted local users), you can add "mitigations=off" to greatly improve performance.

https://fosspost.org/disable-cpu-mitigations-on-linux

abnercoimbre · 2025-09-21T03:46:56 1758426416

This depends on the CPU. From the article you linked:

> some CPUs like those in the AMD 7000 series can actually give a worse performance if mitigations are turned off.

Due diligence!

pengaru · 2025-09-20T16:12:14 1758384734

Linux used to deliver relatively low syscall overhead esp. on modern aggressively speculating CPUs.

But after spectre+meltdown mitigations landed it felt like the 1990s all over again where syscall overhead was a huge cost relative to the MIPS available.

blakepelton · 2025-09-20T14:43:08 1758379388

The article quotes the Intel docs: "Instruction ordering: Instructions following a SYSCALL may be fetched from memory before earlier instructions complete execution, but they will not execute (even speculatively) until all instructions prior to the SYSCALL have completed execution (the later instructions may execute before data stored by the earlier instructions have become globally visible)."

More detail here would be great, especially using the terms "issue" and "commit" rather than execute.

A barrier makes sense to me, but preventing instructions from issuing seems like too hard of a requirement, how could anyone tell?

eigenform · 2025-09-21T01:27:01 1758418021

> preventing instructions from issuing seems like too hard of a requirement

If this were the case, you could perform SYSCALL in the shadow of a mispredicted branch, and then try to use it to leak data from privileged code.

When the machine encounters an instruction that changes privilege level, you need to validate that you're on a correct path before you start scheduling and executing instructions from another context. Otherwise, you might be creating a situation where instructions in userspace can speculatively influence instructions in the kernel (among probably many other things).

That's why you typically make things like this drain the pipeline - once all younger instructions have retired, you know that you're on a correct [not-predicted] path through the program.

edit: Also, here's a recent example[^1] of how tricky these things can be (where SYSCALL isn't even serializing enough to prevent effects in one privilege level from propagating to another)

[^1]: https://comsec.ethz.ch/wp-content/files/bprc_sec25.pdf

convolvatron · 2025-09-20T17:33:59 1758389639

it might have more to do with the difficult in separating out the contexts of the two execution streams across the rings. someone may have looked at the cost and complexity of all that accounting and said 'hell no'

blakepelton · 2025-09-20T21:06:15 1758402375

Yeah, I would probably say the same. It is a bit strange to document this as part of the architecture (rather than leaving it open as a potential future microarchitectural optimization). Is there some advantage an OS has knowing that the CPU flushes the pipeline on each system call?

BobbyTables2 · 2025-09-20T19:15:30 1758395730

And given Intel’s numerous speculation related vulnerabilities, it must have been quite a rare moment!!!

codedokode · 2025-09-20T23:05:37 1758409537

Is it that difficult, add a "ring" bit to every instruction in instruction queue? Sorry I never made a OoO CPU before.

codedokode · 2025-09-20T22:55:39 1758408939

There are so many extra steps, obviously the CPU is designed for legacy monolithic OS like Windows which uses syscalls rarely and would work slowly with much safer and better, than Windows, microkernels.

For example, why bother saving userspace registers? Just zero them out to prevent leaks. Ideally with a single instruction.