What I’ve found works well, going along with the author’s “everything is a log”,...

whalesalad · 2025-09-04T12:13:27 1756988007

This is a really good description of more or less exactly how our current approach works! This is a daily granularity variant we are testing atm, in order to eliminate flip-flops that occur during the length of a business day. The v1 impl was down to the second, this one is daily.

Here is the core of it:

    CREATE TABLE time_travel_daily (
        domain              TEXT        NOT NULL,
        valid_range         tstzrange   NOT NULL,
        valid_from          timestamptz GENERATED ALWAYS AS (lower(valid_range)) STORED,
        valid_to            timestamptz GENERATED ALWAYS AS (upper(valid_range)) STORED,

        tld                     TEXT,
        owned                   BOOLEAN,
        acquired_at             timestamptz,
        released_at             timestamptz,
        registrar               TEXT,
        updated_at              timestamptz,
        accounting_uuid         TEXT,
        offer_received_date     timestamptz,
        payment_received_date   timestamptz,
        sold_at                 timestamptz,
        sold_channel            TEXT,
        last_renewed_at         timestamptz,
        expires_at              timestamptz,
        transfer_started_at     timestamptz,
        transfer_completed_at   timestamptz,
        transfer_eligible_at    timestamptz,

        snapshot_json           JSONB NOT NULL,
        inserted_at             timestamptz DEFAULT NOW() NOT NULL,
        source_data_change_id   INT,

        PRIMARY KEY (domain, valid_range)
    );

    CREATE INDEX ttd_domain_idx ON time_travel_daily(domain);
    CREATE INDEX ttd_gist_valid_range_idx ON time_travel_daily USING gist(valid_range);
    CREATE INDEX ttd_owned_valid_range_idx ON time_travel_daily USING gist(valid_range) WHERE owned = TRUE;
    CREATE INDEX ttd_registrar_idx ON time_travel_daily(registrar) WHERE registrar IS NOT NULL;
    CREATE INDEX ttd_source_data_change_id_idx ON time_travel_daily(source_data_change_id) WHERE source_data_change_id IS NOT NULL;

And then here is a piece of our update trigger which "closes" previous entities and opens an new one:

    UPDATE time_travel_daily
    SET valid_range = tstzrange(lower(valid_range), target_date::timestamptz, '[)')
    WHERE domain IN (
        SELECT DISTINCT dc.domain
        FROM data_changes dc
        WHERE dc.invalidated IS NULL
            AND dc.after IS NOT NULL 
            AND dc.modified_at::date = target_date
    )
    AND upper(valid_range) IS NULL  -- Only close open ranges
    AND lower(valid_range) < target_date::timestamptz;  -- Don't close ranges that started today

MBCook · 2025-09-04T13:48:03 1756993683

A trigger to ‘close’ the old record is a great idea. My stored procedure is also doing some additional validation (validate at every layer = less bugs) so what I’ve got works well enough for me.

But that’s very smart. Never considered it.

geocar · 2025-09-04T08:01:44 1756972904

I'm not entirely sure what the valid_range is doing. Besides updating it, do you use this index for anything else? I agree the performance doesn't seem like it would be great.

I do something like 4000 inserts a second, but maybe only a few queries a minute, so I use an "invalidated_by" column which (eventually) points to the newer record, and I update it on query instead of insert (when the multiple nulls are discovered and relevant)

whalesalad · 2025-09-04T12:20:58 1756988458

The valid_range with a gist index is quite fast.

    SELECT COUNT(DISTINCT domain)
    FROM time_travel
    WHERE (CURRENT_DATE - INTERVAL '90 days')::timestamptz <@ valid_range
    AND owned;

This is asking, "how many domains did we own 90 days ago"

Instead of finding records where the start is less than, end is greater than, you can just say find me rows that will cover this point in time. The GiST index on valid_range does the heavy lifting.

Lots of handy range-specific query tools available: https://www.postgresql.org/docs/17/functions-range.html

MBCook · 2025-09-04T13:44:04 1756993444

Yep this is it. Since you mostly view the newest stuff you could even have a partial index only over the records that are considered current.

And it’s PG’s range queries that make this shine, as you showed. If you had to simulate the range with two columns all the queries would be a pain.

zie · 2025-09-04T16:08:24 1757002104

Interesting. It sounds like you are only doing partial temporality?

I'm playing with https://github.com/hettie-d/pg_bitemporal right now and it seems great so far.