> Applications that have high syscall rates include proxies, databases, and others that do lots of tiny I/O. Also microbenchmarks, which often stress-test the system, will suffer the largest losses.
It’s proof they serve a specific purpose. If the micro benchmark is especially affected by the effect you’re trying to measure, it provides an upper bound of sorts for the effect’s magnitude. This informs the rest of the analysis.
It's surprisingly difficult to come up with a syscall that is guaranteed to enter the kernel and come back out on a fast path. Old choices for this were things like getpid() or gettimeofday() which are now handled in the VDSO. This seems fairly clever, honestly.
An open question is still who should enable the mitigation. The risk-cost doesn't seem to fit too many scenarios.
Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.
While meltdown is interesting, i wouldn't enable kpti on my database servers buried behind other network infrastructure.
> Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.
You realize he's talking about his servers, and the article is talking about Javascript running in the browser, right? The only things talking to his database server is his own server code via SQL queries; if there's any browsers and third-party Javascript running on that database server he's had a real security meltdown, of the kind that makes Meltdown irrelevant.
He's not wrong, it's just that your browser does that automatically every time it loads a web page. That's an insecure-by-default security model and it's why people run JavaScript blockers like NoScript.
A server, however, is unlikely to do this, so the risk profile of something like Meltdown is much lower. It's not nil because if someone does get on your box, whether they're supposed to be there or not, Meltdown can be leveraged to read the memory anywhere on the system, effectively neutering the protections of a multi-user operating system. This is a major information risk by itself since it could leak important things like encryption keys or user login info, but it could also be used to make local privilege escalation exploits simpler. (That said, if a remote user is able to get a shell on your box and submit code to the CPU, it's probably not likely that you won't already have a latent local escalation vulnerability that can be chained to get root.)
So he's right that people who don't execute arbitrary code on their CPU are much less vulnerable, but they're not totally invulnerable because you can't really guarantee that no one will be able to submit instructions to your CPU due to RCEs and the like. Also, you essentially need to trust everyone who has shell access to your server with anything that may be held in the server's memory, including keys, passwords, etc.
People who are considering disabling KPTI will have to decide whether they want to make the extra local attack surface available in exchange for the performance gain under normal operation.
What syscall rates do different databases sustain at maximum load? Transparent huge pages negating most of the overhead is very good news-- but probably helps less with mmap'd IO which so many databases use.
Having recently rewritten Linux's TLB code, this is quite wrong. For an ordinary page fault, there's no flush at all -- changing a page from not present to present doesn't require a flush on x86. Removing a page from page cache can be done with INVLPG, which had been around for a long, long time.
From 4.14 on, Linux has used PCID to improve context switches, independently of PTI. While writing that code, I did a bunch of benchmarking. INVPCID is not terribly useful, even with PCID. In fact, Linux only uses INVPCID on user pages to assist with a PTI corner case. It's not entirely clear to me what Intel had in mind when INVPCID was added
I think he means page fault every time a page is not present.
They're slower, because kernel needs to be mapped in and out of virtual address space, just like for syscalls.
If the access pattern is sufficiently local, perhaps this could be mitigated by using large (2MB) pages. A bad idea for a random access pattern, of course.
It would be interesting to know the interaction between the patched host and the patched guest. As a simple example, if the host aggressively flushes the TLB, the performance impact on the guest of doing the same could be lower. On the other hand, depending on how the host was patched, the loss of performance of the guest could be different when using some features.
He chose to graph from high syscall rate to low. I was initially confused, as most people would have shown the ramp up, left to right. Doesn't matter much though, once you get it.
It would be great to see performance deltas for AMD CPUs too, especially since Meltdown only effects Intel and AMD patches for Spectre Variant 2 are considered optional. I would also be nice to see a discussion of AMD's ASID and any differences it has with Intel's PCID when when PCID is addressed.
Maybe a better formulated question would be - do the Meltdown changes to kernel impose performance penalty on AMD processors as well (regardless of exploitability)?
Extremely correct. Database and frontend servers are hit pretty hard, nothing in the middle is hit. But neither of those ends is actually running untrusted code, for the most part.
I predict a decline of the "hyperconvergence" server and a return of the usual "database server" + "app server" + "frontend server" combo.
Not a Netflix employee, but at my last role we disabled THP on every instance we had. We had issues with databases (MySQL/Cassandra/HBase), Hadoop, and Java applications.
I hear it's gotten a lot better since then, and the compactor doesn't freeze stuff like it used to.
THP can cause serious problems (like “freeze your application for 1s while I compact its pages without asking”). Use at your own risk. It’s much better to use explicit huge pages.
The OCA team were working on it, but I don't have a number. My guess is small, since it uses sendfile and then the packets are all done kernel to kernel. So the syscall rate should be relatively low.
My team's RDS instances got hit hard with a 40% increase in CPU usage: https://imgur.com/a/khGxU