ON UPDATE CASCADE is not the nightmare that you are making it out to be. (My impression from the article is that this is a single SQL database being discussed.)
Sure, if all data is in a single DB. But in the real world you’ve generally got some/all of:
- 1 or more data warehouses
- Other services storing said data (e.g. the user id will live in many databases in a service oriented architecture)
- External API integrators who have their own data stores totally out of your control that also have copies of parts of your data
- Job queues. It’s common to have jobs scheduled for the future on some other system (Redis, etc.) that say “do X for user with id Y in Z days”. If the “id” changes these fail
- Caches. Say I cache data by user email instead of a surrogate key, user changes their email, and another user signs up with the old email. Cache hits for the wrong user!
- etc.
Changing the primary key becomes an absolute nightmare project once data lives in multiple places like this, and in my experience it mostly does.
Having truly stable identity for your entities just solves so many future problems, and it’s SO easy to do. In almost all cases, natural PKs are really all downside, virtually zero upside, except slightly less storage.
It's never a single database in the real world. As soon as you integrate something or have an API to something the keys are out there. Unless you add a translation layer, but then you could just as well use surrogate keys directly.
> Using a natural PK -inside- your own database can still be a lot more pleasant to work with
Until you need to do anything like described above. The advantage of artificial keys is that they have no semantic content. Anything with semantic content carries the risk that the role that semantic content plays in your system can change and cause problems. Having a non-semantic identifier protects you from that.
This is not to say that you should never use a semantic identifier as a key. However, you should always have a non-semantic artificial key as the identifier and use the semantic identifiers only when necessary.
I said "can still be" because, yes, it's a trade-off.
I guess if you're trying to give a rule of thumb to juniors, "always have an auto-inc key in case you've misjudged whether the natural key is a good idea" is probably safest (though if you do tell them that, keep an eye out for them trying to add an auto-inc to things like a many-many join table which should've been PKed on the pair of FKs).
But every database design decision can potentially result in problems if the meaning or usage of part of the data changes, and while you should absolutely take that into account when selecting an initial design, adding complexity in case it's needed later is, itself, also a trade-off.
It used to be that "always use the natural key if there is one, because adding a surrogate key is denormalisation and should be done only when necessary" was a common rule, and that was also overly absolutist.
Basically, my rule of thumb is something like "use a natural key if you're confident that you can DBA your way through any required changes easily enough to make the advantages the rest of the time a net win overall" and I find that has better results than 'always' or 'never' would.
(I'd also note that "the role that semantic content plays" changing also applies to e.g. cardinality of relationships and when -those- change it's Interesting Times no matter what you used for PKs and FKs, so you have to have a plan for things like that anyway ... as ever, it's trade-offs all the way down)
YANGNI for 99% of projects and databases. When you get to global sharded nosql etc. you need to use UUIDs for anything and incrementing IDs falls over too.
You’ll have to worry about performance tanking instead. If you’re using UUIDv7 then less so, but it’s still (at best) 16 bytes, which is double that of even a BIGINT.
Anyone who says UUIDs aren’t a problem hasn’t dealt with them at scale (or doesn’t know what they’re looking at, and just upsizes the hardware).
Most databases with a UUID type store them as 128-bit integers, typically the same as a BIGINT. It's not like 378562875682765 is the bit representation of a bigint either.
And if you're not using uuidv7 or some other kind of cluster-friendly id, you'd best be using a hash index, and if you're doing neither, you probably don't care about their size or performance anyway. You don't pick UUIDs blindly, but on balance, they solve a lot more problems than they cause.
Postgres’ UUID type is 16 bytes. MySQL can store them as BINARY(16) once encoded. Conversely, a BIGINT for either is 8 bytes. Not sure about SQL Server or Oracle.
> You don't pick UUIDs blindly, but on balance, they solve a lot more problems than they cause.
IME, this is precisely the problem – devs choose them blindly, because then you don’t have to think about proper modeling, you can arbitrarily create a key in your app and be nearly guaranteed of its uniqueness, etc.
128 bits is 16 bytes. BIGINT is a 64 bit int (long or long long).
So there is at least additional storage required, and probably some CPU cost as well. And even UUIDv7 isn't guaranteed to produce sequential IDs, but it's probably good enough to not seriously fragment your table storage.