Context switches happen regardless of whether you're using kernel mode ("threads...

otherjason · on Feb 22, 2023

Do you have a citation for kernel mode having more efficient context switches? What kind of direct hardware access are you referring to that would be better than pushing the register context onto the stack?

In my experience, the exact opposite is true, particularly in the era of CPU mitigations that require TLB flushes upon every kernel-mode context switch.

messe · on Feb 22, 2023

You're right, kernel-level context switching is much slower than user-level context switching.

User-level can also have the advantage of having more actual context about the task that is running, meaning that it's often able to avoid saving/restoring as much data as a kernel-level switch would. See Go's green threads for a great example of this kind of cooperation between runtime and language.

> Do you have a citation for kernel mode having more efficient context switches? What kind of direct hardware access are you referring to that would be better than pushing the register context onto the stack?

The closest thing to this that I can think of is on 32-bit x86 which did have hardware assisted context switching via TSRs.

As it happens, everybody stopped using it because it was too slow, and a bit painful unless you fully bought into x86's awful segmentation model. Early Linux kernels use it if you want to see it in action.

the_duke · on Feb 22, 2023

I don't quite follow your argument there.

This is unrelated to kernel threading.

If you have 1 thread handling 1000 requests with some async io mechanism (epoll, io_uring, ...) ,instead of 1000 threads each handling one request, there are much fewer threads fighting over CPU cores and the 1 thread can stay active much longer, hence reducing the amount of context switches.

Especially with a mechanism like io_uring, which helps minimize syscalls (and hence switching to kernel threads).

otabdeveloper4 · on Feb 23, 2023

False. Both epoll and thread switching use the exact same kernel scheduling mechanisms under the hood.

(When the kernel decides which thread gets to run it's doing the equivalent of an epoll call.)