> A non-key column gets a zone map / skip index laid out on top, which is cheap to maintain… because it’s already a column-wise slice of the data.
I don't see how that's different from storing a traditional index. You can't just lay it on top of the column, because the column is stored in a different order than what the index wants.
Zonemap / skip indexes don’t require sorting, still provide significantly improved searching over full tablescans, and typically applied to every column by default. Sorting is even better, just at the cost of a second copy of the dataset.
In a row-based rdbms, any indexing whatsoever is a copy of the column-data, so you might as well store it sorted every time. It’s not inherent to the definition.
That's still a separate index though, no? It's not intrinsic in the column storage itself, although I guess it works best with it if you end up having to do a full-scan of the column section anyway.
> Sorting is even better, just at the cost of a second copy of the dataset.
> ...
> In a row-based rdbms, any indexing whatsoever is a copy of the column-data
I’m not saying columnar databases don’t have indexes, I’m saying that they get to have indexes for cheap (and importantly: without maintaining a separate copy of the data being indexed), to the point that every column is indexed by definition. It’s a separate data structure, but it’s not a separate db object exposed to the user — it’s just part of the definition
> So the same thing, no?
Consider it as like: for a given filtered-query, a row-based storage is doing a table-scan if no index exists. There is no middle ground. Say 0% value or 100%.
A columnar database’s baseline is a decent index, and if there’s a sorted index then even better. Say 60% value vs 100%.
The relative importance of having a separate, explicit, sorted index is much lower in a columnar database, because the baseline is different. (Although maintaining extra sorted indexes is a columnar database is much more expensive — you basically have to keep a second copy of the entire table sorted on the new key(s))
I don't see how that's different from storing a traditional index. You can't just lay it on top of the column, because the column is stored in a different order than what the index wants.