> Reference counting does not provide hard performance guarantees. If you disagr...

hashmash · on April 25, 2024

There's no hard guarantee that freeing an object runs in constant time. Reference counting tends to provide more predictable performance, but I don't think the phrase "hard guarantee" is appropriate.

dataflow · on April 25, 2024

You're conflating a bunch of things, but the biggest one here is that "hard performance guarantee" is not a synonym for "constant-time performance". Nevertheless, feel free to substitute your own preferred vocabulary. I'm just trying to get the underlying point across about how different GC and RC fundamentally are.

pulse7 · on April 25, 2024

With optimized RC (which deffer deallocation) you have no "constant-time performance guarantee", because you don't know at which time the deffered deallocation will happen... And all modern RC implementations are heavily optimized - especially in Apple's ecosystem...

camgunz · on April 25, 2024

Yeah I think generally what people mean here is, "if I'm aware that freeing objects takes time, I expect to spend that time when I free the objects, not at some undetermined point in the future". That's not the case if you do any kind of deferring, at which point you're pretty close to a "classic" GC system.

wbl · on April 25, 2024

Semispace GC takes O(N) for a live set of size N. You're also ignoring the work to actually free which is generally slightly super linear. I'm also not sure what hard means in that case.

dataflow · on April 25, 2024

> Semispace GC takes O(N) for a live set of size N

Is this a rebuttal? How? You have a million live objects, you just want to free 1 object, and this awesome GC accesses O(1 million) objects, and that's somehow comparable to refcounting's O(1)?

> You're also ignoring the work to actually free which is generally slightly super linear

That's like arguing that everyone who claims binary search is O(log n) is "ignoring" the fact that page-table traversals and/or bit-complexity "actually" make it O((log n)^2).

It's "ignored" because the performance of the underlying system is entirely beside the point being made about the comparison of 2 algorithms on top of it. You analyze an algorithm based on the number of calls to the APIs provided to it, which in this case would be mmap/munmap (or VirtualAlloc/VirtualFree, or whatever), and you assume each of those operations takes constant time, unless specified otherwise. It doesn't make sense to dig into the underlying system, especially when nobody has specified what the underlying system is to begin with.

Joker_vD · on April 25, 2024

> One hard guarantee in that case is that freeing M objects (out of N total that are alive) takes O(M) time.

Cheney's algorithm gives you O(N-M) time which really is in the same order of time complexity.

dataflow · on April 25, 2024

> O(N-M) time which really is in the same order of time complexity.

No, it's not. M could be 1 and N could be a million. The entire point of O(M) is that it's independent of N.

Joker_vD · on April 25, 2024

So, yes, if you have a million objects and need to release all of them except 1 then Cheney's algorithm does exactly 1 operation compared to 999'999 operations the reference counting does. Ergo, GC is faster than reference counting. It can even be faster than stack allocation [0]!

[0] https://www.sciencedirect.com/science/article/abs/pii/002001...

dataflow · on April 25, 2024

It's so frustrating and time consuming replying to non-rebuttals like these. I'm going to have to cut it off after this one.

> if you have a million objects and need to release all of them except 1

That literally only happens at most 1 time for each allocation during its lifetime. But the GC can still waste its time traversing that object a billion times before then, despite nobody adding or removing references to it. Whereas RC would only touch each allocation when you're actually doing something to the object.

It's like arguing for wasting several decades of your life, just to make sure your funeral is quick...

Joker_vD · on April 25, 2024

I am still not sold that e.g. using an arena as a scratch space for real-time frame rendering of the scene is worse than refcounting all of the temporary data, especially if the workload fits comfortably into the available memory.

Also, generations are a thing, so no need to "waste several decades of your life". And why is that metaphor applicable anyhow? Why not something like "stop constantly opening and closing the warehouse door during the work, and reinventorize it once a year instead of making a note every time you take something out of it or put something back in it" instead?

Anyway, thanks for you time and exchange.

Karellen · on April 25, 2024

One other advantage of RC that I don't think you've brought up yet, is that you know exactly when the work to free M objects will take place.

wbl · on April 25, 2024

No you don't. If you did you could just call free. You know one of the decrements will do it, but not which one.

Karellen · on April 25, 2024

Good point.

I was thinking of a lot of cases where RC is being used as a general solution for lifetime management with automatic refcounting; but most objects have a well-known lifetime with their refcount being 1 95% of the time. And another 4% have their refcount bump up to 2 briefly when they're temporarily shared, or transferred to a new owner (think returning a shared_ptr<>).

But, yeah, the "genuinely shared" case - either the 1% in the scenario above, or when RC is only being used for objects with complex lifetimes - is definitely a significant use-case I overlooked.

stoperaticless · on April 25, 2024

Nitpick on usage of term “hard guarantee” in the form of rephrasing for GC: “One hard guarantee is that GC takes O(N) (N number of alive objects)”.

“O(M) guarantee” and “you can access all the other objects in parallel” are valid points, but they come with caviats that to utilize the first, one must know how many objects will be freed, and to utilize the second, one must know that two objects are not reachable from each other, which is not always the case e.g. objs=find(scene,criteria)

(Obviously you are aware of that. This comment is meant to inform other readers and prevent further idealisation of RC)

dataflow · on April 25, 2024

> to utilize the “O(M) guarantee”, one must know how many objects will be freed

No -- unless I'm misunderstanding you(?), this is incorrect. The following pseudocode (I'll just use Python syntax for simplicity) has no idea how many objects are being deleted:

  while True:
    print("[Progress] Deleting: " + node.name)
    prev = node
    node = next(node)
    del prev   # Assume this happens M times

Assume each node owns O(1) blocks of memory.

With GC, the user can see random arbitrarily-long stutters, depending on the shape of the rest of the object graph, and the phase of the moon.

With RC, you can rely on the user seeing your progress indicators consistently... without knowing or caring what M was.

> to utilize “you can access all the other objects in parallel”, one must know that two objects are not reachable from each other, which is not always the case e.g. objs=find(scene,criteria)

Again, this is false, and (subtly) misses the point.

The claim wasn't "you can access all objects in parallel". That's not true in any system, be it GC or RC.

The claim is "you can still access some objects in parallel" with RC. The crucial difference here is that, under a GC, all threads are at risk of getting throttled because of each other arbitrarily. They simply cannot do any work (at least, nothing that allocates/frees memory) without getting throttled or interrupted at the whim of the GC.

littlestymaar · on April 25, 2024

> With RC, you can rely on the user seeing your progress indicators consistently... without knowing or caring what M was.

What does that mean? How is that helpful for the user to know you've free-d 56,789 objects so far if he doesn't know if there's 60k in total or 600k?

> The crucial difference here is that, under a GC, all threads are at risk of getting throttled because of each other arbitrarily. They simply cannot do any work (at least, nothing that allocates/frees memory) without getting throttled or interrupted at the whim of the GC.

And that's also the case for all thread which share the same RC-d object. They will all be throttled when the memory is freed.

The biggest benefit of RC is how it seamlessly interact with native code, that's why it's a great fit for languages like Obj-C and Swift or Rust (opt-in), but in terms of performance, be it latency or throughput, it's not a particularly good option (the trade off it makes is comparable to copying GC but with higher max latency than copying GC and even lower throughput, and doesn't have fast allocation and memory compaction as a side benefit).