Except for the fact that one leads to undefined behavior and the other doesn't. ...

bombela · 2025-07-20T05:00:34 1752987634

This might not seem obvious. So allow me an attempt at expanding a bit.

The difference is that a dangling raw pointer to the heap will point to anything that can be modified at any time.

But indexing a dynamic array guarantees that every elements is always well formed and safe to use.

loeg · 2025-07-20T05:02:09 1752987729

It's true that the array approach cannot suffer from type confusion.

loeg · 2025-07-20T05:01:01 1752987661

Non-UB data structure corruption and other incorrect behavior isn't like, super obviously better than UB corruption and other incorrect behavior.

conradludgate · 2025-07-20T05:41:00 1752990060

The obvious upside is that it's so much easier to debug when there's no UB. Debugging UB is never enjoyable.

nullc · 2025-07-21T13:28:25 1753104505

not sure I agree. UB has the benefit that there are a lot of tools like valgrind, memcheck, and various compiler features to detect it both statically and at runtime. And when you hit UB it is certainly an error.

Non-UB errors have no such powerful tools, only what application specific tests or assertions you create.

I've seen plenty of cases where people went and turned signed types into unsigned because signed overflow is undefined, but it had the real effect of turning easily detected and diagnosed issues into much harder to find bugs.

loeg · 2025-07-21T19:16:13 1753125373

Also like, it's not universally true that UB bugs (broad class of problems) are all harder to debug than non-UB bugs (another broad class of problems). Null pointer dereferences are UB, and among the easiest class of problem to debug.

loeg · 2025-07-20T13:36:57 1753018617

The practical ramifications are often similar, even if UB can in theory do anything.

jpc0 · 2025-07-20T07:58:28 1752998308

It’s pretty common to implement graphs in terms of arrays, not because of indices but because of cache locality.

So your “UB” and “non-UB” code would look effectively identical to the CPU and would take the same amount of debugging.

The reality is whether an index was tombstones and referenced or “deallocated” and referenced it is still a programmer fault that is a bug that the compiler could not catch

IshKebab · 2025-07-20T22:13:24 1753049604

> So your “UB” and “non-UB” code would look effectively identical to the CPU and would take the same amount of debugging.

Said the person who has never had to debug serious UB...

burntsushi · 2025-07-20T10:52:18 1753008738

Ummm. Yeah it is. I'm sure you can come up with some cases where the impact of the bugs is roughly equivalent, but generally speaking, UB is a terrible class of bug.

For example, if the aho-corasick crate accidentally tries to look up a dangling state node, you'll get a panic. But if this were UB instead, that could easily lead to a security problem or just pretty much anything... because behavior is undefined.

So generally speaking, yes, this is absolutely a major difference and it matters in practice. You even have other people in this thread saying this technique is used in C for this exact reason. Leaving out this very obvious difference and pretending like these are the same thing is extremely misleading.

jakobnissen · 2025-07-20T12:03:05 1753012985

Author here. In scientific computing, there isn't much of a security implication of UB, and the most dangerous kind of error are when your program silently computes the wrong thing. And that is especially likely when you use indices manually.

You could say that UB enables all behaviour, including silently wrong answers, and you'd be right. But it's more likely to crash your program and therefore be caught.

Most importantly, the comparison to a raw pointer is not relevant. My blog post states that integers come with zero safety (as in: preventing bugs, not risk of UB) and zero language support and that is true. My blog post compares Rust with GC languages, not with raw pointer arithmetic. And it's clear that, when you compare to using GC references, manual indices are horribly unsafe.

burntsushi · 2025-07-20T12:36:16 1753014976

You're engaging in a motte-and-bailey fallacy. Your motte is your much more narrow claim about Rust's advantages in a context where you claim not to care about undefined behavior as a class of bugs more severe than logic errors. And you specifically make this claim in an additional context where a GC language is appropriate. That's an overall pretty easy position to defend, and if that were clearly the extent of it, I probably wouldn't have responded at all.

But, that context has been dropped in this thread, and instead folks are making very general claims outside of that very restricted context. Moreover, your bailey is that you don't do a good job outlining that context either. Your blog's opening paragraphs mention nothing about that more constrained context and instead seem to imply a very general context. This is much harder to defend. You go on to mention scientific computing, but instead of it being a centerpiece of the context of your claim, it's just mentioned as aside. Instead, your blog appears to be making very broad claims. But you've jumped in here to narrow them significantly, to the point that it materially changes your point IMO. Let's just look at what you said here:

> The first time someone gave be this advice, I had to do a double take. The Rust community's whole thing is commitment to compiler-enforced correctness, and they built the borrowchecker on the premise that humans can't be trusted to handle references manually. When the same borrowchecker makes references unworkable, their solution is to... recommend that I manually manage them, with zero safety and zero language support?!? The irony is unreal. Asking people to manually manage references is so hilariously unsafe and unergonomic, the suggestion would be funny if it wasn't mostly sad.

There's no circumspection about the context. You're just generally and broadly dismissing this entirely as if it weren't a valid thing ever. But it absolutely is a valid technique and it has real practical differences with an approach that uses raw pointers. If the comparison is with a GC and that context is made clear, then yes, absolutely, the comparison point changes entirely! If you can abide a GC, then a whole bunch of things get easier... at some cost. For example, I don't think it's possible to write a tool like ripgrep with its performance profile in a GC language. At least, I've never seen it done.

estebank · 2025-07-20T17:30:09 1753032609

> I don't think it's possible to write a tool like ripgrep with its performance profile in a GC language.

I think it is possible to make a language that has both a GC and a borrow checker, treating the GC types as a third level next to the stack and the heap, where complex referencial cycles can bé promoted to the GC, but the defaults push you towards fast execution patterns. Don't know if such a language would be successful in finding its niche. The only way I could see that, is of a non-gc mode could be enforced so that libraries can be written in the more restrictive, faster by default mode, while being consumed by application developers that have less stringent restrictions. This is no different in concept than Python libraries implemented in native languages. Making it the mode be part of the same language could help with prototyping pains: write with the GC and then refactor once at the end after the general design is mostly found.

burntsushi · 2025-07-22T12:33:35 1753187615

My standard of evidence here is existence. There are many greps written in a "GC language," and none of them have a broad performance profile in the same category as ripgrep (or other greps written in C and C++).

> This is no different in concept than Python libraries implemented in native languages.

If that's true, then why isn't there a fast grep written in Python? IMO, it's for the same reason that tools like csvkit are slow. Once you get to the boundary between GC and non-GC, some kind of cost gets paid. Now maybe Python is paying higher costs here than your hypothetical language, but who knows. Without actual existence, I find this sort of hypothetical to be very uncompelling. It's too broad of a stroke.

zozbot234 · 2025-07-20T17:48:08 1753033688

> I think it is possible to make a language that has both a GC and a borrow checker, treating the GC types as a third level next to the stack and the heap, where complex referencial cycles can be 'promoted to the GC'

It's doable but there are a few issues with that whole idea. You need the ability to safely promote objects to GC-roots whenever they're being referenced by GC-unaware code, and demote them again afterwards. And if any GC object happens to control the lifecycle of any heap-allocated objects, it must have a finalizer that cleans them up RAII style to avoid resource leaks.

estebank · 2025-07-20T17:23:50 1753032230

> And that is especially likely when you use indices manually.

But that is never the recommendation Rust practitioners would give for graph data structures. One would instead recommend using a generational arena, where each index holds their "generation". The arena can be growable or not. When an element is removed from the arena it gets tombstoned, marked as no longer valid. If a new value reuses a tombstoned position, its generation changes. This shifts the cost of verifying the handle is correct at the read point: if the handle corresponds to an index with a tombstone sentinel or has a different generation, the result of the read operation is None, meaning the handle is no longer valid. This is much better than the behavior of pointers.

zozbot234 · 2025-07-20T12:27:12 1753014432

For the record, I completely agree that using GC is appropriate for problems that deal with possibly-cyclic general graphs, and maybe even wrt. DAG structures whenever one wishes to maximize throughput and the use of simpler refcounting would involve heavy overhead. (This is a well-known pitfall wrt. languages like Swift, that enforce pervasive use of atomic reference counting.)

Since the use of "pluggable" GC in C-like or Rust-like languages is still uncommon, this generally means resorting to a language which happens to inherently rely on GC, such as Golang.