You make some really good criticisms of OOP language design. I take issue with t...

tialaramex · 2025-03-30T14:39:48 1743345588

Finalizers are crap. Having some sort of do-X-with-resource is much better but now you're back to caring about resource ownership and so it's reasonable to ask what it was that garbage collection bought you.

I agree with your parent that these sort of accidental leaks are more likely though of course not uniquely associated with, having a GC so that you can forget who owns objects.

Suppose we're developing a "store" web site with an OO language - every Shopper has a reference to Previous Purchases which helps you guide them towards products that complement things they bought before or are logical replacements if they're consumables. Now of course those Previous Purchases should refer into a Catalog of products which were for sale at the time, you may not sell incandescent bulbs today but you did in 2003 when this shopper bought one. And of course the Catalog needs Photographs of each product as well as other details.

So now - without ever explicitly meaning this to happen - when a customer named Sarah just logged in, that brought 18GB of JPEGs into memory because Sarah bought a USB mouse from the store in spring 2008, and at the time your catalog included 18GB of photographs. No code displays these long forgotten photographs, so they won't actually be kept in cache or anything, but in practical terms you've got a huge leak.

I claim it is easier to make this mistake in a GC language because it's not "in your face" that you're carrying around all these object relationships that you didn't need. In a toy system (e.g. your unit tests) it will work as expected.

kortilla · 2025-03-30T15:00:15 1743346815

Using a website is a strange example because that’s not at all how web services are architected. The only pieces that would have images in memory is a file serving layer on the server side and the user's browser.

mightyham · 2025-03-31T11:35:11 1743420911

> I claim it is easier to make this mistake in a GC language because it's not "in your face" that you're carrying around all these object relationships that you didn't need. In a toy system (e.g. your unit tests) it will work as expected.

I simply don't see how these mistakes would be any less obvious than in an equally complex C codebase, for instance.

im3w1l · 2025-03-30T17:48:43 1743356923

Agree on Finalizers being crap. Do x with resources is shit too though because I don't want indentation hell. Well I guess you could solve it by introducing a do-x-with variant that doesn't open a new block but rather attaches to the surrounding one.

sham1 · 2025-03-30T15:46:56 1743349616

> Finalizers are crap. Having some sort of do-X-with-resource is much better but now you're back to caring about resource ownership and so it's reasonable to ask what it was that garbage collection bought you.

What garbage collection here brings you, and what it has always brought you here, is to free you from having to think about objects' memory lifetimes. Which are different from other possible resource usages, like if a mutex is locked or whatnot.

In fact, I'd claim that conflating the two as is done for example with C++'s RAII or Rust Drop-trait is extremely crap, since now memory allocations and resource acquisition are explicitly linked, even though they don't need to be. This also explains why, as you say, finalizers are crap.

Things like Python's context managers (and with-blocks), C#'s IDisposable and using-blocks, and Java's AutoCloseables with try-with-resources handle this in a more principled manner.

> I agree with your parent that these sort of accidental leaks are more likely though of course not uniquely associated with, having a GC so that you can forget who owns objects. > > Suppose we're developing a "store" web site with an OO language - every Shopper has a reference to Previous Purchases which helps you guide them towards products that complement things they bought before or are logical replacements if they're consumables. Now of course those Previous Purchases should refer into a Catalog of products which were for sale at the time, you may not sell incandescent bulbs today but you did in 2003 when this shopper bought one. And of course the Catalog needs Photographs of each product as well as other details.

Why are things like the catalogues of previous purchases necessary to keep around? And why load them up-front instead of loading them lazily if you actually do need a catalogue of incandescent light bulbs from 2003 for whatever reason?

> So now - without ever explicitly meaning this to happen - when a customer named Sarah just logged in, that brought 18GB of JPEGs into memory because Sarah bought a USB mouse from the store in spring 2008, and at the time your catalog included 18GB of photographs. No code displays these long forgotten photographs, so they won't actually be kept in cache or anything, but in practical terms you've got a huge leak.

I'm sorry, what‽ Who is this incompetent engineer you're talking about? First of all, why are we loading the product history of Sarah when we're just showing whatever landing page we show whenever a customer logs in? Why are we loading the whole thing, instead of say, the last month's purchases? Oh, and the elephant in the room:

WHY THE HELL ARE WE LOADING 18 GB OF JPEGS INTO OUR OBJECT GRAPH WHEN SHOWING A GOD-DAMN LOGIN LANDING PAGE‽ Instead of, you know, AT MOST JUST LOADING THE LINKS TO THE JPEGS INSTEAD OF THE ACTUAL IMAGE CONTENTS!

Nothing about this has anything to do with whether a language implementation has GC or not, but whether the hypothetical engineer in question, that wrote this damn thing, knows how to do their job correctly or not.

> I claim it is easier to make this mistake in a GC language because it's not "in your face" that you're carrying around all these object relationships that you didn't need. In a toy system (e.g. your unit tests) it will work as expected.

I don't know, if the production memory profiler started saying that there's occasionally a spike of 18 GB taken up by a bunch of Photograph-objects, that would certainly raise some eyebrows. And especially since in this case they were made by an insane person, that thought that storing the JPEG data in the objects themselves is a sane design.

---

As you said above, this is very much not unique to language implementations with GC. Similar mistakes can be made for example in Rust. And you may say that no competent Rust developer would do this kind of a mistake. And that's almost certainly true, since the scenario is insane. But looking at the scenario, the person making the shopping site was clearly not competent, because they did, well, that.

vlovich123 · 2025-03-30T16:19:36 1743351576

Why do you treat memory allocations as a special resource that should have different reasoning about lifetime than something like a file resource, a db handle, etc etc? Sure if you don’t care about how much memory you’re using it solves a small niche of a problem for a lot of overhead (both CPU and memory footprint) but Rust does a really good job of making it easy (and you rarely / never really have to implement Drop).

The context managers and stuff are a crutch and admittance that the tracing GC model is flawed for non memory use cases.

ablob · 2025-03-30T17:12:06 1743354726

Memory is freed as soon as the program closes by the operating system. If you have an open connection to a database they might expect you to talk to them before you close the handle (for example, to differentiate between crash and normal shutdown). Any resource shared between programs might have a protocol that needs to be followed which the operating system might not do for you.

The GC model only cares about memory, so I don't really understand what you mean by "flawed for non memory use cases". It was never designed for anything other than managing memory for you. You wouldn't expect a car to be able to fly, would you?

I personally like the distinction between memory and other resources. If you look hard enough I'm sure you'll find ways to break a model that conflates the two. Similar to this, the "everything is a file" model breaks down for some applications. Sure, a program designed to only work with files might be able to read from a socket/stream, but the behavior and assumptions you can make vary. For example, it is reasonable to assume that a file has a limited size. Imagine you're doing a "read_all" on some video-feed because /dev/camera was the handle you've been given. I'm sure that would blow up the program.

In short, until there is a model that can reasonably explain why memory and other resources can be homogenized into one model without issue, I believe it's best to accept the special treatment.

vlovich123 · 2025-03-31T02:38:51 1743388731

> to differentiate between crash and normal shutdown

> Any resource shared between programs might have a protocol that needs to be followed which the operating system might not do for you.

Run far and quickly from any software that attempts to do this. It’s a sure fire indication of fragile code. At best such things should be restricted to optimizing performance maybe if it doesn’t risk correctness. But it’s better to just not make any assumptions about reliable delivery indicating graceful shutdown or lack of signal indicating crash.

> In short, until there is a model that can reasonably explain why memory and other resources can be homogenized into one model without issue, I believe it's best to accept the special treatment.

C++ and Rust do provide compelling reasons why memory is no different from other resources. I think the counter is the better position - you have to prove that a memory resource is really different for the purposes of resource management than anything else. You can easily demonstrate why network and files probably should have different APIs (drastically different latencies, different properties to configure etc). That’s why network file systems generally have such poor performance - the as-if pretension sacrifices a lot of performance.

The main argument tracing GC basically makes is that it’s ok to be geeedy and wasteful with RAM because there’s so much of it that retaining it extra is a better tradeoff. Similarly it argues that the cycles taken by GC and random variable length pauses it often generates don’t matter most of the time. The counter is that while it probably doesn’t matter in the P90 case, it does matter when everyone takes this attitude and P95+ latencies (depending on how many services are between you and the user you’d be surprised how many 9s of good latency your services have to achieve for the eyeball user to observe an overall good P90 score).

> For example, it is reasonable to assume that a file has a limited size. Imagine you're doing a "read_all" on some video-feed because /dev/camera was the handle you've been given. I'm sure that would blow up the program.

Right, which is one of many reasons why you shouldn’t ever assume files you’re given are finite length. You could be passed stdin too. Of course you can make simplifications in cases, but that requires domain-specific knowledge, something tracing GCs do not have because they’re agnostic to use case.

Mawr · 2025-03-30T18:05:22 1743357922

Because 99.9% of the time the memory allocation I'm doing is unimportant. I just want some memory to somehow be allocated and cleaned up at some later time and it just does not matter how any of that happens.

Reasoning about lifetimes and such would therefore be severe overkill and would occupy my mental faculties for next to zero benefit.

Cleaning up other resources, like file handles, tends to be more important.

vlovich123 · 2025-03-31T02:55:30 1743389730

Then stick it in a Box, RC or Arc and forget about it. You only need lifetimes for cases where you want to avoid heap allocations.

int_19h · 2025-03-31T06:34:14 1743402854

Simple reference counting fails as soon as you have a graph that might have loops in it.

vlovich123 · 2025-03-31T13:43:12 1743428592

Then use “tracing GC as a library” like [1] or [2]. I’m not saying there’s no use for tracing GC ever. I’m saying it shouldn’t be a language level feature and it’s perfectly fine as an opt-in library.

[1] https://github.com/Manishearth/rust-gc

[2] https://github.com/oilpan-gc/cppgc

int_19h · 2025-03-31T23:22:00 1743463320

Bolt-on GCs necessarily have to be conservative about their assumptions, which significantly hinders performance. And if language semantics doesn't account for it, the GC can't properly do things like compacting (or if they can, it requires a lot of manual scaffolding from the user).

It should be a language level feature in a high-level language for the simple reason that in vast majority of high-level code, heap allocations are very common, yet most of them are not tied to any resource that requires manual lifetime management. A good example of that is strings - if you have a tracing GC, the simplest way to handle strings is as a heap-allocated immutable array of bytes, and there's no reason for any code working with strings to ever be concerned about manual memory management. Yet strings are a fairly basic data type in most languages.

vlovich123 · 2025-04-01T17:09:00 1743527340

That's my point though. You either have graph structures with loops where the "performance" of the tracing GC is probably irrelevant to the work you're doing OR you have graph ownership without loops. The "without loops" is actually significantly more common and the "loops" case actually has solutions even without going all the way to tracing GC. Also, "performance" has many nuances here. When you say "bolt on GC significantly hinders performance", are you talking about how precisely they can reclaim memory / how quickly after being freed? Or are you talking about the pauses the collector has to make or the atomics it injects throughout to do so in a thread-safe manner?

I suspect the benefits of compaction are wildly overstated because AFAIK compaction isn't cache aware and thus the CPU cache thrashes. By comparison, a language like Rust lets you naturally lay things out in a way that the CPU likes.

> if you have a tracing GC, the simplest way to handle strings is as a heap-allocated immutable array of bytes

But what if I want to mutate the string? Now I have to do a heap allocation and can't do things in-place. Memory can be cheap to move but it can also add up substantially vs an in-place solution.