When the memory allocator works against you

		When the memory allocator works against you (glandium.org)
		89 points by bostik on March 12, 2017 \| hide \| past \| favorite \| 65 comments

rini17 on March 12, 2017 | [–]

How about trying pypy? It has different GC/memory allocator and the 2.7-compatible branch is pretty much drop-in replacement (Unless you need something like PyQt).

bogomipz on March 12, 2017 | | [–]

The author states:

>“Import manifest” phase (which, in fact, allocates and frees increasingly large buffers for each manifest, as manifests grow in size in the mozilla-central history)."

I am not familiar with the reference of a "manifest" or importing them at least not in the context of memory allocation. Can someone shed some light on this?

Also right after the author states?

>"To put the theory at work, I patched the python interpreter again, making it use malloc() instead of mmap() for its arenas."

If I understand correctly the author is trying to remove the possibility of entanglement but doesn't the malloc() glibc library routine translate into an mmap() system call? I am not sure how that would help disambiguate memory allocators then.

jwilk on March 12, 2017 | | [–]

https://www.mercurial-scm.org/wiki/Manifest

eternalban on March 12, 2017 | | | [–]

> "Import manifest" phase ... in the context of memory allocation [+].

+of git operations.

bogomipz on March 12, 2017 | | | [–]

Sorry, I don't understand your reply. Could you elaborate?

eternalban on March 12, 2017 | | | [–]

A manifest is an element of a Git repository. Importing a manifest is a git operation. So OP is looking at memory patterns per distinct git ops.

bogomipz on March 12, 2017 | | | [–]

I see, that makes perfect sense. I'm not sure how I missed that, thanks.

quickben on March 12, 2017 | | [–]

>"I hadn’t optimized for memory use in the first place."

It is git, it is made to grow and get into billions of string comparisons. Anyways...

C/c++ would have probably been a better language for his problem.

Simply prealocate as much as needed as a working buffer and eliminate the need to profile libc and patch your interpreter altogether.

glandium on March 12, 2017 | | [–]

Author here. FWIW, git-cinnabar is written in python because it uses the mercurial code to talk to mercurial servers, and the mercurial code is in python.

Well, that was true when it was first released, but recent versions have gained "native" support to talk to mercurial servers (where a helper program written in C does the talking[1]), although the mercurial code is still used to read bundle2[2] data that comes from the mercurial server, even in that case (that is, only bundle1 is supported without using mercurial code).

I'm moving more and more things to the helper with the ultimate goal of not having any python code left.

1. https://glandium.org/blog/?p=3648 2. https://www.mercurial-scm.org/wiki/BundleFormat2

Kenji on March 12, 2017 | | | [–]

I concur. If the memory allocator causes problems and you use a language with managed memory/a garbage collector, you're doing something wrong.

Filligree on March 12, 2017 | | | [–]

An aggressive, copying garbage collector would have eliminated this problem from the start. People often argue that GCs have unavoidable overhead, but it's bounded overhead; they defragment your program with every GC cycle.

Kenji on March 12, 2017 | | | [–]

GC does have unavoidable overhead, in particular, it makes the execution times of your binary extremely unstable and high-variance. Not to mention, it has no idea how your program uses memory, such that it is bound to use a generic approach while with manual management, you can get real and high performance.

pg314 on March 12, 2017 | | | [–]

> it makes the execution times of your binary extremely unstable and high-variance.

Not necessarily, it depends on your allocation patterns. The same can be said for using malloc.

> Not to mention, it has no idea how your program uses memory, such that it is bound to use a generic approach

Ditto for malloc.

> with manual management, you can get real and high performance.

Using a GC does not mean you can't manually manage allocations. E.g. say you are allocating and deallocating a lot of Point objects. You keep a free list of Point objects that you can recycle. If you need a Point, you first see if there are any in the free list. If so, you pop the first one off the list and reuse it. Otherwise you allocate a new one. When you're done with a Point, you push it on the free list.

jstimpfle on March 12, 2017 | | | [–]

>> Not to mention, it has no idea how your program uses memory, such that it is bound to use a generic approach

> Ditto for malloc.

malloc != manual management

> Using a GC does not mean you can't manually manage allocations. E.g. say you are allocating and deallocating a lot of Point objects...

But that's not using a GC. It's manual memory management.

buzzybee on March 12, 2017 | | | [–]

That's aggressively missing the point. If you design the program architecture around manual memory techniques, the runtimes are stable even in a GC. If you use malloc willy-nilly throughout core algorithms, your runtimes become unpredictable.

jstimpfle on March 12, 2017 | | | [–]

But my point was simply that you can't defend GC by contrasting to manual management, if you do manual management.

> If you use malloc willy-nilly throughout core algorithms, your runtimes become unpredictable.

First, where did I say "use malloc willy-nilly"? Second, do you have experience with that? I haven't much, but I haven't heard your claim before.

Of course "willy-nilly" should not be done with real-time constraints. But I have written a substantial algorithmic program (no RT constraints) in that style when I didn't know better, and it was very well performing. (probably glibc's malloc was optimized for my allocation patterns).

The main problem I see with that style is that each malloc has memory overhead, and that it leads to unmaintainable code.

Kenji on March 12, 2017 | | | [–]

Actually, to address realtime constraints... In my job, I work on systems with realtime constraints and even in less critical parts (latency of ~100 microseconds is still acceptable) we had to eliminate all malloc calls. Mostly because malloc on our platform essentially acquires one big lock for memory operations and if some other process has that lock, you get nasty delays or even priority inversions. Allocate a big chunk beforehand, preferrably in the form of ring buffers, and then operate on these if you need precise timing and performance.

jstimpfle on March 12, 2017 | | | [–]

... or use a version of malloc on that preallocated thread-local chunk. At least if the only problem is locking overhead.

mgottein on March 12, 2017 | | | | [–]

I personally believe one of the biggest reasons for the popularity of managed code is GC. Manual memory management is pretty hard to get right, even the C++ stl offers reference counted pointers as a form of automatic memory management

dbaupp on March 12, 2017 | | | [–]

I think that is a rather common belief... "Managed" is exactly referring to automatic memory management meaning the defining feature of the category is literally GC.

jstimpfle on March 12, 2017 | | | [–]

Wikipedia disagrees. "Managed code is computer program source code that requires and will execute only under the management of a Common Language Runtime virtual machine, typically the .NET Framework, or Mono. The term was coined by Microsoft."

dbaupp on March 12, 2017 | | | [–]

Huh. By that definition Java isn't managed, which seems to be missing the point and not as nearly useful as a broad category like everyone outside the Microsoft ecosystem uses it. That said, even if "CLR" is sensibly replaced with "CLR, JVM or similar", it does seem like I have the etymology/definition wrong, thanks.

jstimpfle on March 12, 2017 | | | [–]

Yes, I also think of Java as "managed". But the common idea of "managed" seems to be more than only "garbage collected".

draw_down on March 12, 2017 | | [–]

Good thing you didn't optimize prematurely ;)

__s on March 12, 2017 | | [–]

ITT: people being snide about scripting languages over a post digging into a glibc malloc mishap

Kenji on March 12, 2017 | | [–]

Did he really find a bug in glibc malloc? He didn't write that. This sounds like a bug in the python runtime and possibly in the offending program itself.

malisper on March 12, 2017 | | | [–]

I don't fully understand the post, but I don't think the author found a bug anywhere. The main discovery the author made was that CPython uses its own allocator on top of memory allocated by mmap. This leads to a difference between the reported allocated memory by malloc and the memory used by the program. The author still hasn't discovered a root cause for the large amounts of memory allocated.

glandium on March 12, 2017 | | | [–]

Author here. The last graph actually shows that the problem still happens with CPython's allocator disabled.

Also, to clarify, the amount of memory allocated is actually decreasing (in-use bytes), while the memory allocator requests more and more memory from the system (system bytes), presumably because of fragmentation.

Kenji on March 13, 2017 | | | [–]

>The last graph actually shows that the problem still happens with CPython's allocator disabled.

Excuse my ignorance, but couldn't it still be a problem of the python runtime? I.e. they use malloc wrong, etc?

malisper on March 12, 2017 | | | | [–]

I see. I thought system bytes included the mmaped block of memory (the docs for malloc_stats explicitly say it does not) and I thought the last graph was a demonstration of why the Python memory allocator was necessary.

crististm on March 12, 2017 | [–]

People never learn. News at eleven.

Just the other day was reading N. Wirth objecting on using a resource lavishly just because it was cheap. And just now, in the same front page of HN, some guy wants programmers to stop calling themselves engineers because they aren't.

Well, let's build a bridge using all the iron we can get our hands on (iron is very cheap you know); or even better, let's use all the RAM in the computer with this program of ours.

dom0 on March 12, 2017 | | [–]

To be fair, the MM primitives that we have today are the same we had in 1979, while the capabilities of the hardware evolved, and the actual needs evolved, too.

For example, caching. It's essentially impossible to do caching at the application layer right unless your cache is also persistent with no durability or consistency needs whatsoever and can be mmaped (and you don't mind total I/O trashage). The persistence part also implies that you must be able to store the entire cache somewhere.

What an application would want here is a mechanism to allocate cache memory that the kernel can throw away, if need be, and notify the application of that, so that the app can adjust. That's not impossible to do, it's just tricky - and beyond POSIX.

mikeash on March 12, 2017 | | | [–]

I'm pretty sure "real" engineers would be very much in favor of using a cheap resource lavishly, if the tradeoff is worth it. Adding a million dollars to the design cost in order to save $100,000 in resource costs isn't a good idea, and a good engineer will design accordingly.

The difference is that engineers have codified this and call it "safety factor." Your house is built to carry perhaps ten times the load it truly needs to, because it's cheaper to throw in extra materials than it is to build it to hold precisely what it needs. An airplane is built with much thinner margins, because saving materials is far more valuable, and it's well worth the extra time and attention to.

tedunangst on March 12, 2017 | | | [–]

I suppose that works with RAM and CPU waste, but I notice the people building gigabyte consuming irc clients aren't paying for all their users RAM, so the cost tradeoff is rather different.

crististm on March 12, 2017 | | | | [–]

Look up Hamming's course on "Learning to learn". An optimal design is not robust. You add a safety factor and this addition is wasteful in some sense (thus not optimal) and necessary in another. Do you design a building so that the toilets are optimally used? Likewise for planes.

But just because you have enough of a cheap resource (RAM) it is not a given that using it in excess is either optimal or robust.

TeMPOraL on March 12, 2017 | | | [–]

Also, most users aren't really aware what's using up the resources - only that their computer gets slower with time. I actually wish Windows would start naming & shaming applications - e.g. by showing a balloon when the system slows down, that would say that application X and Y are consuming most of the resources.

As it is now, CPU, RAM and storage usage are perfect externalities for developers. The result of their abuse is user frustration, and reduction of the number of things people can do on their computers. That, and wasting user time. Unfortunately, there's no way that would make the developers suffer costs for wasting user's resources, so the only thing that lets us have lean software is engineering competence and human decency (both of which seem to be in short supply).

0xcde4c3db on March 12, 2017 | | [–]

It's also a lot of fun when someone gets the bright idea of "oh, we'll allocate until we have 75% of RAM and then stop; that will keep us from swapping". Which isn't so terrible until somebody installs a second program using that strategy.

crististm on March 12, 2017 | | [–]

My favourite these days is Linux OOM killer. For some people it's just right to randomly kill a process instead of rejecting the current request. My mind gets a blue-screen every time I'm contemplating this issu.,....ds.fsd... :)

edit: https://discuss.96boards.org/t/yocto/576

dom0 on March 12, 2017 | | | [–]

For Linux systems in (heavy) interactive use early-oom is a must have, pretty much regardless of how much memory you have.

SomeStupidPoint on March 12, 2017 | | | [–]

Wait, what?

Memory allocation "doesn't fail" so they kill a random program?

Can someone link me to an explanation of this?

cturner on March 12, 2017 | | [–]

"Memory allocation "doesn't fail""

That's not quite it. But it does deserve a double-take.

When you request memory, it generally works out fine. But it doesn't actually give you a lock on the pages you need until you request those pages. This leads to weird situations where memory is oversubscribed. When this happens the OS casts one of your processes into the rift. Google for "oom killer".

To me this sounds insane - a basic breach in the contract of malloc. However, it's possible that this is one of those situations that involves a complex tradeoff that you won't grok until you've been the guy building it. It would be interesting to know what other OSs do.

Something that is annoying: as far as I know there is no signal that fires when this happens. Instead, you scrape logs to learn about it. Whereupon you will reboot the box, because you no longer trust its state. You can get a callback when this happens if you use systemtap.

In practice it's less of a problem than you'd think. If you care about the thing you're building, you will have made it host-independent. From the perspective of production software, there's no difference between a rifted process and a motherboard death. If you're using sockets as your IPC mechanism, it doesn't matter why the process died, just that it did.

mikeash on March 12, 2017 | | | [–]

To elaborate a bit, there are basically three ways to do this stuff.

One is to commit memory on request. If all of your RAM and swap is committed, then malloc will fail. This is probably how we expect it to work, but it can be really wasteful. Some programs will malloc a huge area and only use a small portion of it, and in this scheme, that memory is "in use." It's also worth noting that most programs don't tolerate malloc failure well at all, and will just crash anyway.

Another way is to overcommit, where malloc doesn't actually reserve memory initially but only on first use, but only pause processes when memory fills up. If a process needs more memory and there isn't any more available, that process blocks until more memory becomes available. This means nobody gets killed, but it's not uncommon for the system to just deadlock like this forever once memory fills up.

And the third way is as you describe, to overcommit and kill processes when you run out of memory.

Note that there's no right answer, just different tradeoffs. Linux lets you configure this to an extent as well, so if you don't like the default, you can change it!

tedunangst on March 12, 2017 | | | [–]

> It's also worth noting that most programs don't tolerate malloc failure well at all, and will just crash anyway.

But at least then it's the program that doesn't handle failure that dies, not one that does!

paulddraper on March 12, 2017 | | | | [–]

> Some programs will malloc a huge area and only use a small portion of it, and in this scheme, that memory is "in use." It's also worth noting that most programs don't tolerate malloc failure well at all, and will just crash anyway.

...and it would be absurd for bad programs to eat memory and crash? We must have other programs pay for its sins?

mikeash on March 12, 2017 | | | [–]

Tha program that hits a malloc failure will not necessarily be the program that ate up all the RAM.

tedunangst on March 12, 2017 | | | [–]

Malloc failure is at least synchronous with the out of memory condition. And if your program has reached a fixed state, it may not even be calling malloc.

cturner on March 12, 2017 | | | [–]

Ot: your flak blog could benefit from a page showing links to all articles. Currently, hard to read beyond 'recent'.

tedunangst on March 12, 2017 | | | [–]

Don't waste your time with the old crap. :)

paulddraper on March 12, 2017 | | | | [–]

Sure, but (a) it can handle the malloc failure and (b) it's pretty easy for the user to find the bad guy.

deathanatos on March 12, 2017 | | | [–]

It might handle the malloc failure by crashing. Some languages (e.g., Rust) and runtimes (e.g., glib) presume that small allocations always succeed; a failure in such an allocation immediately crashes the program (as the underlying API provides no means to report the failure).

steveklabnik on March 13, 2017 | | | [–]

Rust the language doesn't know anything about allocation; it's entirely a standard library decision.

mikeash on March 13, 2017 | | | | [–]

(a) it probably won't therefore (b) the "bad guy" is going to be most/all processes.

__s on March 12, 2017 | | | | [–]

The OOM killer doesn't pick the process by an entirely random method, it seeks to kill the problem

charleslmunger on March 12, 2017 | | | | [–]

Android handles this by having a total ordering of the importance of every application process, and building its APIs such that process death of something the user isn't interacting with doesn't lose state.

mikeash on March 12, 2017 | | | [–]

iOS is like that too. It keeps all programs running by default, and then when memory gets low it'll start killing them. It starts with backgrounded apps that aren't doing anything, and kills the current foreground app as a last resort.

Apps are supposed to be built to tolerate being killed while in the background and restoring their state when relaunched, so in theory this will mostly be invisible to the user other than apps taking longer to reload. In practice, lots of apps don't get this right.

gcr on March 12, 2017 | | | | [–]

After a fork, does the amount of committed memory increase by the size of the child's allocated virtual memory space?

I'm curious how committed memory interacts with COW fork semantics. After all, a child can just overwrite the COW/copy of the memory it inherited from its parent without allocating anything else.

haldean on March 12, 2017 | | | [–]

No, all of the child's memory mappings are shared with its parent. Upon write, though, that becomes a new committed mapping; I believe the entire page is copied, so you could end up committing another 4k with a one-byte write from the child.

gcr on March 13, 2017 | | | [–]

Hm. Then suppose I turn off the OOM killer. What happens to a process that writes to its parent's page if the system has no memory left to commit?

tedunangst on March 12, 2017 | | | | [–]

COW is usually a kind of loophole. Most systems only count the pages once (otherwise you'd have to reject the fork, but that becomes inconvenient fast).

gcr on March 12, 2017 | | | | [–]

There's a switch you can toggle that will turn off the OOM (vm.overcommit_memory).

The downside is memory allocations that would oversubscribe memory will immediately fail. That means processes won't be able to allocate tens of gigabytes of virtual memory anymore.

If I understand the limitation correctly, this also means that all process's allocated virtual memory can no longer exceed the amount of physical+swap memory in the machine. I think this probably interacts badly with COW memory semantics for child processes: even though children and parent reuse address space, the kernel can't know whether the child will write to the memory it inherited from its parent's memory map. I think if the kernel is being too careful, all "shared" memory would have to be accounted separately, which would severely limit the number of processes you could have open.

(Is that really what happens? That seems awfully limiting. I hope I'm wrong.)

deathanatos on March 12, 2017 | | | [–]

You can still allocate more virtual memory; it just has to be backed by a file, e.g., through mmap(). So, for example, what mongod does — mmap() in its entire DB, which can be terabytes of RAM — would still work.

You can almost think of file-backed mmaps as swap, in fact.

SomeStupidPoint on March 12, 2017 | | | | [–]

I appreciate your reply (and replies to your reply), but I must be missing something about why this is a good idea.

Is it just that software is (badly) coded to request lots of memory it doesn't intend to use? The only sensible usecase I can think of for that is oversubscribing VMs, but even there... Doesn't this just crash a VM when there's contention for the memory that's oversubscribed?

I guess that would be okay for the VM case, but why would I want this in a normal usecase? It seems like it's just encouraging/supporting bad memory management in software.

Have I just been lucky that this has never killed a long-running job of mine when I use 90%+ of RAM to do a big job? Or shit, maybe it has and the specific error just was obscured. Ugh, I hate when the system lies by default.

How does that help me as a user?

Asooka on March 12, 2017 | | | [–]

It does help you as a user, since the OOM killer tries to be somewhat sensible in what it kills. Imagine you have 10 well-behaved programs and 1 program that just leaks 1GB/h. After a few hours your entire RAM has been committed and the next process to request memory will fail. In this scenario, the process that crashes is random - it might be the leaky program, or it might be your desktop shell. The OOM killer on the other hand will definitely kill the leaky program once there is literally no more memory available. It's a combination of programs not using up all pages they mmap and there being no good ordering of which memory requests are more important that leads to overcommitting and the OOM killer being generally good ideas. In my experience that's exactly how it works out - the moment you start something that gobbles up memory due to a bug, it gets killed promptly, rather than some other random program crashing due to being unable to allocate memory.

deathanatos on March 12, 2017 | | | | [–]

> Something that is annoying: as far as I know there is no signal that fires when this happens. Instead, you scrape logs to learn about it. Whereupon you will reboot the box, because you no longer trust its state. You can get a callback when this happens if you use systemtap.

It's a SIGKILL, which isn't catchable/handleable by the process (how could it be?).

The machine should still be stable after an OOM, just minus a few process. (In my production setups, usually a supervising daemon can restart the now-dead child, and begin the memory leaking cycle anew.)

fhars on March 12, 2017 | | | [–]

The real drawback of disabling memory overcommitting is not that processes can no longer allocate huge amounts of unused memory, but that processes that use more memory that the remaining availabe RAM + swap can no longer call fork (even if the forked process like most forked processes would immediately discard all the shared memory by execing some small program).