Meltdown Initial Performance Regressions

mrep · on Feb 10, 2018

> Applications that have high syscall rates include proxies, databases, and others that do lots of tiny I/O. Also microbenchmarks, which often stress-test the system, will suffer the largest losses.

My team's RDS instances got hit hard with a 40% increase in CPU usage: https://imgur.com/a/khGxU

Xorlev · on Feb 10, 2018

Clarification of the parent: A 40% increase over baseline (60% util -> 84% util), not an absolute 40% CPU util increase.

Still terrible though. :(

vanderZwan · on Feb 10, 2018

> Also microbenchmarks, which often stress-test the system, will suffer the largest losses.

Tangent: this only seems like more proof that we should not rely on them as developers, no?

pdpi · on Feb 10, 2018

It’s proof they serve a specific purpose. If the micro benchmark is especially affected by the effect you’re trying to measure, it provides an upper bound of sorts for the effect’s magnitude. This informs the rest of the analysis.

scott_s · on Feb 9, 2018

The syscall in his benchmark made me laugh (https://github.com/brendangregg/Misc/blob/master/s1bench/s1b...):

  close(999);	// the syscall (it errors, but so what)

ajross · on Feb 10, 2018

It's surprisingly difficult to come up with a syscall that is guaranteed to enter the kernel and come back out on a fast path. Old choices for this were things like getpid() or gettimeofday() which are now handled in the VDSO. This seems fairly clever, honestly.

pmontra · on Feb 10, 2018

This is a thing that generates many useless syscalls despite vDSO:

https://blog.packagecloud.io/eng/2017/02/21/set-environment-...

tych0 · on Feb 10, 2018

Why not just syscall(__NR_pid)?

Hello71 · on Feb 10, 2018

getpid is only fast on ia64 according to vdso(7).

jmgao · on Feb 10, 2018

glibc caches getpid, so you'll only see the syscall happen after the cache gets invalidated by a call to fork, clone, etc.

koverstreet · on Feb 10, 2018

manpages can generally be assumed to be out of date

jnordwick · on Feb 10, 2018

An open question is still who should enable the mitigation. The risk-cost doesn't seem to fit too many scenarios.

Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.

While meltdown is interesting, i wouldn't enable kpti on my database servers buried behind other network infrastructure.

dataflow · on Feb 10, 2018

> Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.

https://react-etc.net/entry/exploiting-speculative-execution...

prewett · on Feb 10, 2018

You realize he's talking about his servers, and the article is talking about Javascript running in the browser, right? The only things talking to his database server is his own server code via SQL queries; if there's any browsers and third-party Javascript running on that database server he's had a real security meltdown, of the kind that makes Meltdown irrelevant.

dataflow · on Feb 10, 2018

Yes? I'm just addressing his assertion that "Meltdown requires running native, intrusted code".

cookiecaper · on Feb 10, 2018

He's not wrong, it's just that your browser does that automatically every time it loads a web page. That's an insecure-by-default security model and it's why people run JavaScript blockers like NoScript.

A server, however, is unlikely to do this, so the risk profile of something like Meltdown is much lower. It's not nil because if someone does get on your box, whether they're supposed to be there or not, Meltdown can be leveraged to read the memory anywhere on the system, effectively neutering the protections of a multi-user operating system. This is a major information risk by itself since it could leak important things like encryption keys or user login info, but it could also be used to make local privilege escalation exploits simpler. (That said, if a remote user is able to get a shell on your box and submit code to the CPU, it's probably not likely that you won't already have a latent local escalation vulnerability that can be chained to get root.)

So he's right that people who don't execute arbitrary code on their CPU are much less vulnerable, but they're not totally invulnerable because you can't really guarantee that no one will be able to submit instructions to your CPU due to RCEs and the like. Also, you essentially need to trust everyone who has shell access to your server with anything that may be held in the server's memory, including keys, passwords, etc.

People who are considering disabling KPTI will have to decide whether they want to make the extra local attack surface available in exchange for the performance gain under normal operation.

jnordwick · on Feb 10, 2018

There is no Meltdown JavaScript exploit that i know of. This page lumps them together.

To exploit meltdown you need to peek at specific places of memory (and flush the probe array from cache which might not be possible in JS easily).

roblabla · on Feb 10, 2018

Your knowledge is broken then. There's literally an example in the Spectre Whitepaper. Here is the example from the whitepaper: https://react-etc.net/page/meltdown-spectre-javascript-explo...

jnordwick · on Feb 11, 2018

How does that apply to the Meltdown and these KPTI-related slowdowns?

Can you be a little more specific about this JavaScript meltdown implementation in the Spectre paper?

Scaevolus · on Feb 9, 2018

What syscall rates do different databases sustain at maximum load? Transparent huge pages negating most of the overhead is very good news-- but probably helps less with mmap'd IO which so many databases use.

sargun · on Feb 10, 2018

Mmap’d IO is still a shit show because of clearing the CR3 register on page faults.

slashdev · on Feb 10, 2018

What is a good source to read more about that?

amluto · on Feb 10, 2018

What precisely do you mean by "clearing the CR3 register?"

vardump · on Feb 10, 2018

Although slightly technically inaccurate, he clearly means a full TLB flush.

Needs to be done on Intel CPUs older than Haswell, on those CPUs without INVPCID support.

With INVPCID you can partially invalidate TLB.

amluto · on Feb 10, 2018

Having recently rewritten Linux's TLB code, this is quite wrong. For an ordinary page fault, there's no flush at all -- changing a page from not present to present doesn't require a flush on x86. Removing a page from page cache can be done with INVLPG, which had been around for a long, long time.

From 4.14 on, Linux has used PCID to improve context switches, independently of PTI. While writing that code, I did a bunch of benchmarking. INVPCID is not terribly useful, even with PCID. In fact, Linux only uses INVPCID on user pages to assist with a PTI corner case. It's not entirely clear to me what Intel had in mind when INVPCID was added

bsdnoob · on Feb 10, 2018

Why would that be the case? I don't think you'd be changing page directory very often for mmio

vardump · on Feb 10, 2018

I think he means page fault every time a page is not present.

They're slower, because kernel needs to be mapped in and out of virtual address space, just like for syscalls.

If the access pattern is sufficiently local, perhaps this could be mitigated by using large (2MB) pages. A bad idea for a random access pattern, of course.

vbernat · on Feb 9, 2018

It would be interesting to know the interaction between the patched host and the patched guest. As a simple example, if the host aggressively flushes the TLB, the performance impact on the guest of doing the same could be lower. On the other hand, depending on how the host was patched, the loss of performance of the guest could be different when using some features.

rmrfrmrf · on Feb 10, 2018

Out of curiosity, why does the syscall rate scale descend from left to right?

tyingq · on Feb 10, 2018

He chose to graph from high syscall rate to low. I was initially confused, as most people would have shown the ramp up, left to right. Doesn't matter much though, once you get it.

voidlogic · on Feb 10, 2018

It would be great to see performance deltas for AMD CPUs too, especially since Meltdown only effects Intel and AMD patches for Spectre Variant 2 are considered optional. I would also be nice to see a discussion of AMD's ASID and any differences it has with Intel's PCID when when PCID is addressed.

mangix · on Feb 10, 2018

What's being tested here has nothing to do with Spectre. Only KPTI.

bitL · on Feb 10, 2018

Maybe a better formulated question would be - do the Meltdown changes to kernel impose performance penalty on AMD processors as well (regardless of exploitability)?

paulmd · on Feb 10, 2018

They are not enabled for AMD by default. If you force them to be enabled, they impact performance (of course).

jnordwick · on Feb 10, 2018

Even on Intel there should be an evaluation of performance cost vs risk. If you are not running unknown code, you may not want to enable kpti either.

paulmd · on Feb 11, 2018

Extremely correct. Database and frontend servers are hit pretty hard, nothing in the middle is hit. But neither of those ends is actually running untrusted code, for the most part.

I predict a decline of the "hyperconvergence" server and a return of the usual "database server" + "app server" + "frontend server" combo.

voidlogic · on Feb 10, 2018

Thanks, I understand now. What I didn't realize is that KPTI is only related to Meltdown mitigation. Thanks.

b4lancesh33t · on Feb 10, 2018

I'm kinda surprised Netflix weren't already using THP.

Xorlev · on Feb 10, 2018

Not a Netflix employee, but at my last role we disabled THP on every instance we had. We had issues with databases (MySQL/Cassandra/HBase), Hadoop, and Java applications.

I hear it's gotten a lot better since then, and the compactor doesn't freeze stuff like it used to.

nvarsj · on Feb 10, 2018

THP can cause serious problems (like “freeze your application for 1s while I compact its pages without asking”). Use at your own risk. It’s much better to use explicit huge pages.

photon-torpedo · on Feb 10, 2018

Previous HN discussion on THP: https://news.ycombinator.com/item?id=15795337

kev009 · on Feb 10, 2018

Do you have any idea how much it will regress your I/O workload on OCA?

brendangregg · on Feb 10, 2018

The OCA team were working on it, but I don't have a number. My guess is small, since it uses sendfile and then the packets are all done kernel to kernel. So the syscall rate should be relatively low.