Won't you still need indexes on those UUIDs anyway? And possibly have to do more...

dspillett · 2024-09-10T09:55:41 1725962141

> Won't you still need indexes on those UUIDs anyway?

Depends on how index pages are structured in the DB. With MySQL (assuming the InnoDB engine) the primary key is pretty much always a clustered index, in MS SQL Server this is the case more often than not too. This means that any page expansion due to splits from the randomness of the data being inserted affects the whole table not just the key index, and rebuilding to claim back the wasted space later will take a lot longer as you are moving all your based data around. With the UUID as a supplementary key, likely indexed on its own, all that gets enlarged by excess page splitting is those 128-bit values not the entire rows and an index rebuild to fix that up after the fact moves a lot less data around.

So yes, you have an extra index on top of your primary key and other additional ones needed by your apps and reports, but not having a random UUID as your clustering key can be a significant benefit. Using more ordered UUIDs minimises this difference considerably though.

Even ignoring the random-key-causes-page-splits issue, if you've mitigated it with more ordered UUIDs, a wider primary key increases the size of all supplementary indexes assuming MySQL arranges things similarly to SQL Server: rather than having a hidden page/row identifier (as SQL Server does without any clustered key defined, it calls such arrangements heap tables), each supplementary index includes the clustering key value with every row. So while having a 32-bit primary key for internals and the 128-bit UUID as an extra key adds 4 bytes per row (for the extra INT32) to the base data, it saves 12 bytes from each row in each non-clustered index (as the INT32 is included, not the UUID).

> And possibly have to do more joins to resolve them?

Usually not. Usually when you have both the INT32 (or sometimes IN64) surrogate key is for all internal use and a UUID only for external references, so the UUIDs are only important as the initial filter and not likely not taking part in a JOIN at all. After the initial filter to find the item you want in the main table of the query, the JOINs to collect data from other tables will all be by the surrogate (INT) keys. The UUID is almost never used as a foreign key reference in this arrangement.

benwilber0 · 2024-09-10T00:19:42 1725927582

You only need 1 index on the UUID. Instead of everywhere the UUID is referenced from other tables

paperplatter · 2024-09-10T18:41:13 1725993673

Yes, if a client gives a UUID and you want to look up relations to that table, you'll have to join with that table. And the index on the UUID has its own costs. It's still better this way. If for some rare reason this join is too expensive, there are other options like using a cache or denormalizing the tables slightly. The alternative of a UUID pkey will seriously slow down every join.

One question is whether you do random or k-sorted UUIDs for a secondary key. K-sorted is likely faster, but in many cases the difference is small enough that you'd rather take the easy random route which is also guaranteed not to leak any info.