People recommend tracing garbage collection because it performs better than atom...

mike_hearn · on May 12, 2024

Yes you do still benefit, otherwise V8 would use refcounting for everything.

Problems with refcounting:

1. You leak cycles. Given the point of a memory management scheme is to reliably not leak, this is a major weakness. Cycles are very common and natural things to find in heaps, and even if you avoid them they can easily be introduced years later by people you will never meet who weren't aware of the entire heap topology. You say you'd just "forbid cyclic references": how will you do that? Require everything to be written in Rust?

2. Every object gets bigger, as it must store a reference. This increases memory usage and reduces locality. With a GC it's possible to squeeze the overhead down to a vtable pointer (required anyway in most cases) and then a few bits on top!

3. All the incs/decs bloat your code, creating a form of pervasive slowness that doesn't show up in profilers except in hot loops. This tempts you to micro-optimize add/decrefs, but that increases the chance of mistakes.

4. Requires a very consistent protocol for how and when references are taken and released. The details here can get complicated. For example do containers add/decref for you, or do they expect you to do that?

5. You are exposed to use-after-free bugs, which are a major source of security holes.

There are probably more. And obviously the restriction to not use threads is catastrophic for any system that cares about performance.

The problem with having the ability to hold managed pointers from kernel<->userspace (whether using rc or gc) is that a process might be malicious or arbitrarily buggy, in the standard OS model. It means you have to be able to kill the process and release all its resources, which means the kernel must know where all the references to kernel-side objects are coming from so they can be removed on kill.

To allow object graphs to span the user/kernel space boundary therefore requires the kernel to mandate the use of a specific compiler that's a part of the TCB (i.e. a part of the kernel itself). That radically changes the design of the whole OS. In particular you cannot implement a UNIX that way.

WalterBright · on May 12, 2024

The decrement also needs to be in a "finally" exception handler.

There's also a problem when taking a reference to the contents of an object - that reference can survive the release of the object.

p_l · on May 12, 2024

As sibling comment mentioned, reference counting is very much non-predictable in latency.

If you want predictable, you want your GC to run either in constant time slice, or an upper-bounded timeslice.

Deferred refcounts can become quite complex if you want to avoid "I have just released last reference to something that contained a lot of other referenced" (there are some funny threads where Rust users discover that they sometimes need to spin an extra thread to not have effective "stop the world" XD).

So instead common approach is tracing GC with some form timesliced running collector, guaranteeing that time taken from mutator will be upper-bounded. For example, instead of stopping the world, concurrent GC will usually only require a synchronization point. Sometimes it also means that the collector will be given explicit timeslice to ensure that there's certain amount of work being done - it's all tradeoffs based on how much throughput and latency you want.

You can speed up tracing collectors through things like escape analysis and 1bit reference counting (for example, you set a bit if a new reference is passed outside of current scope, every object without bit set can be assumed to have all referenced contained in given scope, etc).

eru · on May 12, 2024

> People recommend tracing garbage collection because it performs better than atomic reference counting, even though reference counting is much simpler.

Reference counting is a lot more complicated! It turns every read into a write.

> [...] if my goal is performance and predictable latency?

Reference counters don't have predictable latency: a single reference decrement can lead to an arbitrary amount of follow up work, like more decrements and deallocations.

Performance of reference counting isn't generally better than that of garbage collectors. (Especially since you still need a backup solution to deal with cycles.)

Figuring out a lot of your memory management at compile time is definitely a win. But doesn't really have much to do with reference counting.

sph · on May 12, 2024

> Reference counting is a lot more complicated! It turns every read into a write.

Why? Maybe it's crucial to the story, but I am exploring Rust-like single ownership semantics.

You would have to "write" only when you create a new reference to something, or the reference is destroyed. But it wouldn't turn any read into a write, no?

marcosdumay · on May 12, 2024

Rust ownership works because it runs entirely at compile time. Rust has infrastructure for runtime inference counting, but you need to actively use it, and it's complicated.

Also, every time you read a pointer you are creating a new reference. That's what turns "every read into a write". You are correct that it's not actually every read, it's just that every small block that reads the value will also write over it.

hu3 · on May 12, 2024

https://news.ycombinator.com/item?id=32276580#:~:text=One%20....

steveklabnik · on May 12, 2024

You don’t do this in Rust. Up u can take a normal &T into something held by an Rc<T>. The count will not be touched.

It’s only when you need an additional owning reference that you call clone() and increment the count, or when one of the owners goes out of scope, and Drop decrements the count.

But for non-owning cases, which are many of them in most code, the count stays the same.

steveklabnik · on May 12, 2024

"Up u" should be "You", I don't know why my phone autocorrected this way. Too late to edit now!

hu3 · on May 12, 2024

That has a cost both to the compiler and to the developer.

It's not a free win.

steveklabnik · on May 12, 2024

It’s part of the standard library, not the language, so it’s not really a cost to the compiler.

And this just uses all the same rules as normal Rust code, so like, sure, to use features you have to know them, but I find it a bit silly to frame that as a “cost.” If that’s a cost, so is literally everything.

Anyway my main point is just that you have less refount traffic than in other implementations.

hu3 · on May 12, 2024

If were going down the calling silly route, I'd rather categorize handwaving of Rusts additional complexities by their community as silly.

If "everything has a cost" anyway we might as well code is assembly for optimal performance. But no, because nuances and differences of cognitive load during development can be significant. And so does compilation speed.

It's arguably much easier to code with a GC and language adoption reflects that.

cesarb · on May 12, 2024

> I'd rather categorize handwaving of Rusts additional complexities by their community as silly.

I believe what steveklabnik meant, was that all this extra complexity (the borrow checker) already exists in the language for other reasons. It's not an additional cost to the compiler or the developer when using the reference-counted types (Rc or Arc).

steveklabnik · on May 12, 2024

That's correct. I thought we were talking specifically about refcount traffic, not about broader issues.

cesarb · on May 12, 2024

> It turns every read into a write.

Not exactly; it turns every passing of a reference, even when it's going to be used only for reading (or not used at all, or just passed on further) into a write of the reference count. But if you can read without passing the reference to somewhere else (or if the compiler can prove that the reference the caller has outlives the reference it would have passed), it can omit the write.

And tracing garbage collection is not completely immune to this; some kinds also turn every read into a write (into a card marking table), copying gc periodically rewrites unmodified objects (when moving them during garbage collection), and the tracing itself might need to read cold objects which might even be located on disk (swapped out).

> Reference counters don't have predictable latency: a single reference decrement can lead to an arbitrary amount of follow up work, like more decrements and deallocations.

However, they have a predictable place for that latency: it can only happen when there's a reference decrement (and the amount of extra work at that point is bounded by the size of the graph which can be reached from that reference). It also will not affect other unrelated threads. With a stop-the-world GC, a thread which allocates too much can cause pauses in threads which are not even allocating or releasing memory at that moment.

> Performance of reference counting isn't generally better than that of garbage collectors. (Especially since you still need a backup solution to deal with cycles.)

However, the memory use of reference counting is generally better than that of tracing garbage collectors, since it releases memory as soon as it can be released. That is, tracing garbage collectors trade extra memory use for their speed; giving them more memory (up to a point) means they can do less work, and giving them too little extra memory can make them much slower.

> Figuring out a lot of your memory management at compile time is definitely a win. But doesn't really have much to do with reference counting.

It does, because it can help reduce the cost of reference counting (by eliding most of the reference count manipulation when it's not necessary).

eru · on May 13, 2024

> With a stop-the-world GC, [...]

Yes, if you compare an implementation of reference counting that doesn't stop the world with an implementation of GC that does explicitly stop the world, you will find that GC stops the world more.

jstimpfle · on May 12, 2024

> Reference counters don't have predictable latency: a single reference decrement can lead to an arbitrary amount of follow up work, like more decrements and deallocations.

You can push the cleanup work on a queue to do it asynchronously, to bound the amount of work per deref / to even the work out.

eru · on May 13, 2024

Yes, definitely, you don't have to use the simplest version of reference counting, you can soup it up.

But you can play the same games with GC. So this is not specific to reference counting. There are incremental and real time versions of GC.

naasking · on May 12, 2024

The dislike started earlier than the rise of pervasive threading. As another poster said, every read turns into a write, meaning, everytime you pass a reference into a function you have to increment the counter, and decrement when the function exits. It turns out, the vast majority of of inc/dec ops are pointless ones like this, which is considerable overhead for literally no benefit.

You can elide some of them using a sort of borrow analysis. But borrowed refs can only be used in certain ways, and so this analysis isn't perfect and can't eliminate all of the unnecessary overhead. This was all hashed out in the 80s and 90s IIRC, before multithreading even became a big issue.

The complex analyses needed to make RC perform well end up destroying the simplicity argument, and even after they still aren't competitive with tracing.

pornel · on May 12, 2024

Rust has this. Data can be placed in atomic or non-atomic refcounted containers, and the type system disallows sharing non-thread-safe types across threads.

Additionally, data can be borrowed from inside the refcounted wrapper, and used freely within a scope without touching the reference count.

neonsunset · on May 12, 2024

Lack of multi-threading simplifies a lot, but is at odds with performance.

wongarsu · on May 12, 2024

You can still have multithreading, you just have to forbid shared data. You can move data between threads at thread creation and termination, and for everything else the language can provide channels/queues to move data between threads. Erlang operates under that model and is great at multithreading

neonsunset · on May 12, 2024

It might be great, just not performant.

CSP often locks you into a single consumer style of pattern, which is highly problematic and scales poorly for anything that isn't statically partitionable. Once you go to MPMC, you start sharing data which immediately forces you to have proper ARC with the high synchronization cost it entails (even without using atomics, RC forces cacheline eviction between cores meaning it acts as a brutal bottleneck on many-core systems per Amadhal's law).

sph · on May 12, 2024

That is indeed the goal I am working towards, but a bit more relaxed than Erlang: mutability is possible only within a single actor/thread, but anything sent in a message to another actor is deep copied. So no memory is ever shared, but the language allows single-threaded mutation for performance.

hu3 · on May 12, 2024

"just have to forbid shared data"

That's a too severe constraint to be qualified as "just".