This is a very narrow definition of a time series DB, really more of a pure metr...

claytonjy · on Oct 18, 2021

I couldn't agree more, and I've talked to the folks at TimescaleDB about exactly this issue; it can be hard for folks familiar with the narrow definition to understand how many more usecases a tool like Timescale can fit.

More broadly I think this is an issue with a narrow definition of "time series", aside from the DB angle. When I was doing more forecasting and predictive modeling, I was constantly stymied by "time series" resources only considering univariate time series, where my problems were always extremely multivariate (and also rarely uniformly sampled...).

When I've asked around for other vocabulary here, the options are slim. Panel data can work, but that has more to do with there being both a time and spatial dimension (e.g. group, cohort, user, etc.) than there being multiple metrics observed over time. It's also an unfamiliar term for data scientist folks without a traditional stats background. "Multivariate time series" might be technically correct, but that works much better in the modelling domain than the database domain.

jwatte · on Oct 18, 2021

I've found one source of vocabulary to come from research into the temporal relational algebra and temporal databases.

For example: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47... (This being a not fully settled / exploited area of computational science, there are plenty of alternative interpretations, too.)

The main challenge here is that extending relational algebra ("key tuple maps to row") into time ("key tuple maps to function from time to row") essentially cross-products two different dimensions, creating a much bigger domain to process and reason about.

Once you have the theory down, you can express some pretty handy relationships.

For you functional folks, you can consider the TRA as the regular RA lifted into the time monad. Except the composition can also go cross-wise; TRA can also be considerd as the narrow timeline lifted into the relational monad. Fun times!

zerkten · on Oct 18, 2021

If you click around the site you will find https://www.honeycomb.io/blog/why-observability-requires-dis..., including a Strange Loop video that goes into their datastore. This is likely a factor in how they think and write about this topic.

digbybk · on Oct 18, 2021

Can you talk more about why you should use Postgres unless you have a compelling reason not to? I'm currently investigating options for storing high-frequency data. Postgres is looking like a good option.

gilbetron · on Oct 18, 2021

Postgres has a lot of engineering work put into it to handle all kinds of use cases, plus massive effort by the community building tutorials, videos, wrappers, libraries, etc. If you have a problem with Postgres config, or are trying to get it to do something odd, there is undoubtedly a bunch of Stack Overflow discussions about that thing. For most other databases, the selection of all of those is much thinner.

Additionally, often we believe our application needs feature X, and there is some database tech that purports to excel at X, the fact is postgres is probably able to do X unless you get to some extreme velocity or volume. Furthermore, by going with another database you are almost always giving up on features W, Y, and Z that you don't realize you need that Postgres supports and isn't supported by the "Exotic" database you are thinking of.

In short, Postgres has amazing breadth and depth in features and support and tooling. Be sure you are ok giving that up!

akulkarni · on Oct 18, 2021

If you like Postgres, you may want to try TimescaleDB, which is a time-series database built on Postgres (packaged as a Postgres extension). Postgres database + time-series database all in one.

This btw is one of the reasons I love Postgres - its extensibility.

(Disclaimer: TimescaleDB cofounder)

gilbetron · on Oct 18, 2021

Did you guys used to call it just ScaleDB? I remember talking to some people about something called ScaleDB that was built on top of Postgres back when I was looking for database solutions for another product, this was in 2015. We ended up going with Druid for that.

proddata · on Oct 18, 2021

What do yo mean by high-frequency data? 100Hz, 1KHz, 100KHz? For that kind of use cases many time-series DBs break apart. We have customers storing multiple millions of high frequency measurements per sec in arrays.

I would say, Postgres is not too storage efficient in itself for large amounts of data, especially if you need any sorts of indexes. Timescale basically mitigates that by automatically creating new table in the background ("chunks") and keeping individual tables small.

LoriP · on Oct 19, 2021

TimescaleDB also implements compression. From the docs:

When compression is enabled, TimescaleDB converts data stored in many rows into an array. This means that instead of using lots of rows to store the data, it stores the same data in a single row. Because a single row takes up less disk space than many rows, it decreases the amount of disk space required, and can also speed up some queries.

(Timescale employee)

digbybk · on Oct 19, 2021

Generally in the 100Hz or 200Hz range for the time being. What do you mean by break apart?

proddata · on Oct 20, 2021

Not being able to keep up with the incoming data. But 100-200Hz I'd consider fine for most

mehanik · on Oct 18, 2021

We tried to use Postgres with TimescaleDB plugin for high frequency data several TB in size. It was unusable. Switched to Clickhouse, which was roughly 50-100 faster on the same hardware and 10 times less disk space. They use very different storage engines with different functionality so check the docs to see what fits your use case.