Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You should also consider how unique that field is. If half your widgets are blue and half your widgets are red, indexing the color is probably not helpful (ie, the query planner will ignore the widget_color index and do a full table scan). A rule of thumb number I vaguely recall is 10%


My recollection in postgres anyway is that low cardinality indexes aren't useful, because it doesn't take into account which side of the 1/99% you're on when determining to use the index.

What is useful is to do a partial index on values where foo=small % ,value because then it will use that index when it matches the query, but not when it's in the majority case.


> My recollection in postgres anyway is that low cardinality indexes aren't useful, because it doesn't take into account which side of the 1/99% you're on when determining to use the index.

It does take that into account. Demo:

    =# CREATE TABLE low_cardinality AS SELECT generate_series(-10, 1000000) < 0 AS col;
    =# CREATE INDEX ON low_cardinality (col);
    =# ANALYZE low_cardinality;
    =# EXPLAIN SELECT * FROM low_cardinality WHERE col = false;
    ┌─────────────────────────────────────────────────────────────────────────┐
    │                               QUERY PLAN                                │
    ├─────────────────────────────────────────────────────────────────────────┤
    │ Seq Scan on low_cardinality  (cost=0.00..14425.11 rows=1000011 width=1) │
    │   Filter: (NOT col)                                                     │
    └─────────────────────────────────────────────────────────────────────────┘
    (2 rows)

    =# EXPLAIN SELECT * FROM low_cardinality WHERE col = true;
    ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │                                             QUERY PLAN                                             │
    ├────────────────────────────────────────────────────────────────────────────────────────────────────┤
    │ Index Only Scan using low_cardinality_col_idx on low_cardinality  (cost=0.42..4.44 rows=1 width=1) │
    │   Index Cond: (col = true)                                                                         │
    └────────────────────────────────────────────────────────────────────────────────────────────────────┘
    (2 rows)


Guess they've improved -- That used to be a thing. Looks like it was probably in the PG13 changes to the statistics and btree indexes that did that, though it's hard to tell exactly.


It worked like this for much longer. The oldest version I have around is 9.2 and it behaves the same. You probably are thinking of a somewhat more complex scenario...


Forgot to say: Obviously a partial index is going to be considerably better, performance and size wise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: