Generational GC designs do not "deallocate" objects. In fact, most GCs don't. It's an understandable but an unfortunate misconception that sometimes causes developers to write more GC-unfriendly code than necessary.
When a collection in a generational GC occurs, the corresponding generation's heap is scanned for live objects, which are then usually relocated to an older generation. In a most common scenario under generational theory - only few objects survive, and most die in a young/nursery/ephemeral generation (.NET, JVM and other ecosystems have different names for the same thing).
This means that upon finishing the object relocation, the memory region/segment/heap that was previously used can be immediately made available to subsequent allocations. Sometimes it is also zeroed as part of the process, but the cost of such on modern hardware is miniscule.
As a result, the more accurate intuition that applies to most generational GC implementations is that pause time / CPU cost scales with live object count and inter-generational traffic. There is no individual cost for "object deallocation". This process is vastly more efficient than reference counting. The concern for overly high allocation traffic remains, which is why allocation and collection throughput are defining characteristics of GC implementations, alongside average pause duration, frequency, costs imposed by specific design elsewhere, etc.
Allocating and deallocating a complex graph of reference-counted objects costs >10x times more than doing so with a modern GC implementation. I don't know which implementation was used back in the day in Cocoa, but I bet it was nowhere as advanced as what you'd see today in JVMs or .NET.
A few entry points into the documentation, where several ifs and buts are described on how to make use of Objective-C GC, without possibility having things go wrong.
All of this contributed for Objective-C not being a sound implementation with lots of radar issues and forum discussions, as you might expect trying to make existing projects, or any random Objective-C or C library now work under Objective-C GC semantics and required changes, wasn't that easy.
Naturally having the compiler automate retain/release calls, similar to how VC++ does with _com_ptr_t (which ended up being superseded by other ways) for COM, was a much better solution, without requiring a "rewrite the world" approach.
Automate a pattern developers were already expected to do manually, and leave everything else as it is, without ifs and buts regarding code best practices, programming patterns, RC / GC interoperability issues with C semantics and so on.
The existing retain/release calls wouldn't be manually written any longer, everything else stays the same.
Naturally Apple being Apple, they had to sell this at WWDC as some kind of great achievement of how RC is much better than GC, which in a sense is correct but only from point of view of the underlying C semantics and the mess Objective-C GC turned out to be, not tracing GC algorithms in general.
When a collection in a generational GC occurs, the corresponding generation's heap is scanned for live objects, which are then usually relocated to an older generation. In a most common scenario under generational theory - only few objects survive, and most die in a young/nursery/ephemeral generation (.NET, JVM and other ecosystems have different names for the same thing).
This means that upon finishing the object relocation, the memory region/segment/heap that was previously used can be immediately made available to subsequent allocations. Sometimes it is also zeroed as part of the process, but the cost of such on modern hardware is miniscule.
As a result, the more accurate intuition that applies to most generational GC implementations is that pause time / CPU cost scales with live object count and inter-generational traffic. There is no individual cost for "object deallocation". This process is vastly more efficient than reference counting. The concern for overly high allocation traffic remains, which is why allocation and collection throughput are defining characteristics of GC implementations, alongside average pause duration, frequency, costs imposed by specific design elsewhere, etc.
Allocating and deallocating a complex graph of reference-counted objects costs >10x times more than doing so with a modern GC implementation. I don't know which implementation was used back in the day in Cocoa, but I bet it was nowhere as advanced as what you'd see today in JVMs or .NET.