> A non-key column gets a zone map / skip index laid out on top, which is cheap ...

setr · 2025-10-02T04:31:58 1759379518

Zonemap / skip indexes don’t require sorting, still provide significantly improved searching over full tablescans, and typically applied to every column by default. Sorting is even better, just at the cost of a second copy of the dataset.

In a row-based rdbms, any indexing whatsoever is a copy of the column-data, so you might as well store it sorted every time. It’s not inherent to the definition.

SkiFire13 · 2025-10-04T10:20:32 1759573232

> Zonemap / skip indexes don’t require sorting

That's still a separate index though, no? It's not intrinsic in the column storage itself, although I guess it works best with it if you end up having to do a full-scan of the column section anyway.

> Sorting is even better, just at the cost of a second copy of the dataset. > ... > In a row-based rdbms, any indexing whatsoever is a copy of the column-data

So the same thing, no?

setr · 2025-10-04T18:07:24 1759601244

I’m not saying columnar databases don’t have indexes, I’m saying that they get to have indexes for cheap (and importantly: without maintaining a separate copy of the data being indexed), to the point that every column is indexed by definition. It’s a separate data structure, but it’s not a separate db object exposed to the user — it’s just part of the definition

> So the same thing, no? Consider it as like: for a given filtered-query, a row-based storage is doing a table-scan if no index exists. There is no middle ground. Say 0% value or 100%.

A columnar database’s baseline is a decent index, and if there’s a sorted index then even better. Say 60% value vs 100%.

The relative importance of having a separate, explicit, sorted index is much lower in a columnar database, because the baseline is different. (Although maintaining extra sorted indexes is a columnar database is much more expensive — you basically have to keep a second copy of the entire table sorted on the new key(s))