That is nice, although I think Heartbleed was due to a missing bounds check enabling the reading of adjacent memory, not due to reusing the same buffer...
If my memory is correct: yes, the root cause was a missing bounds check, but the vulnerability was much worse than it could have been because OpenSSL tended to allocate small blocks of memory and aggressively reuse them — meaning the exploited buffer was very likely to be close in proximity to sensitive information.
I don’t have time right now to research the full details, but the Wikipedia article gives a clue:
> Theo de Raadt, founder and leader of the OpenBSD and OpenSSH projects, has criticized the OpenSSL developers for writing their own memory management routines and thereby, he claims, circumventing OpenBSD C standard library exploit countermeasures, saying "OpenSSL is not developed by a responsible team." Following Heartbleed's disclosure, members of the OpenBSD project forked OpenSSL into LibreSSL.
Until very recently, memory allocators were more than happy to return you the thing you just deallocated if you asked for another allocation of the same size. It makes sense, too: if you're calling malloc/free in a loop, which is pretty common, this is pretty much the best thing you can do for performance. Countless heap exploits later (mostly attacking heap metadata rather than stale data, to be honest) allocators have begun to realize that predictable allocation patterns might not be the best idea, so they're starting to move away from this.
True of the more common ones, but it should be acknowledged that OpenBSD was doing this kind of thing (and many other hardening techniques) before heartbleed, which was the main reason Theo de Raadt was so upset that they decided to circumvent this, because OpenBSD's allocator could have mitigated the impact otherwise.
Even higher-performance mallocs like jemalloc had heap debugging features (poisoning freed memory) before Heartbleed, which -- if enabled -- would catch use-after-frees, so long as libraries and applications didn't circumvent malloc like OpenSSL did (and Python still does AFAIK).
Don't you sort of have to do that if you're writing your own garbage collector, though? I guess for a simple collector you could maintain lists of allocated objects separately, but precisely controlling where the memory is allocated is important for any kind of performant implementation.
Python does refcount-based memory management. It's not a GC design. You don't have to retain objects in an internal linked list when the refcount drops to zero, but CPython does, purely as a performance optimization.
Type-specific free lists (just a few examples; there are more):
And also just wrapping malloc in general. There's no refcounting reason for this, they just assume system malloc is slow (which might be true, for glibc) and wrap it in the default build configuration:
So many layers of wrapping malloc, just because system allocators were slow in 2000. Defeats free() poisoning and ASAN. obmalloc can be disabled by turning off PYMALLOC, but that doesn't disable the per-type freelists IIRC. And PYMALLOC is enabled by default.
Thanks for the links! I wasn't aware of the PyMem_ layer above, the justification for that does sound bad.
But Python runs a generational GC in addition to refcounting to catch cycles (https://docs.python.org/3/library/gc.html): isn't fine control over allocation necessary for that? E.g. to efficiently clear the nursery?
Ah, good point; at the very least things like zeroing out buffers upon deallocation would have helped. Yes, I was a fan of the commits showing up at opensslrampage.org. One of the highlights was when they found it would use private keys as an entropy source: https://opensslrampage.org/post/83007010531/well-even-if-tim...
That's what happens by using normal malloc/free anyway, no? Implementations of malloc have a strong performance incentive to allocate from the cache hot most recently freed blocks.
Yes, all allocators (except perhaps OpenBSDs from what I see in this thread) do this. It is also why `calloc` exists - because zero-initializing every single allocation is really, really expensive.