would you recommend clickhouse over duckdb? and why?

nasretdinov · on Oct 22, 2024

IMO the only reason to not use ClickHouse is when you either have "small" amount of data or "small" servers (<100 Gb of data, servers with <64 Gb of RAM). Otherwise ClickHouse is a better solution since it's a standalone DB that supports replication and in general has very very robust cluster support, easily scaling to hundreds of nodes.

Typically when you discover the need for OLAP DB is when you reach that scale, so I'm personally not sure what the real use case for DuckDB is to be completely honest.

justCHurious · on Oct 23, 2024

There is another place where you should not use CH, and it's in a system with shared resources. CH loves, and earned the right, to have spikes of hogging resources. They even allude to this on the Keeper setup - if you put the nodes for the two systems in the same machine, CH will inevitably push Keeper off the bed and the two will come to a disagreement. You should not have it on a k8s Pod for that reason, for example. But then again, you shouldn't have ANY storage of that capacity in a k8s pod anyways.

geysersam · on Oct 22, 2024

DuckDB probably performs better per core than clickhouse does for most queries. So as long as your workload fits on a single machine (it's likely that it does) it's often the most performant option.

Besides, it's so simple, just a single executable.

Of course if you're at a scale where you need a cluster it's not an option anymore.

zX41ZdbW · on Oct 22, 2024

The good parts of DuckDB that you've mentioned, including the fact that it is a single-executable, are modeled after ClickHouse.

RyanHamilton · on Oct 22, 2024

Can you provide a reference for that belief? To me that's not true. They started from solving very different problems.

geysersam · on Oct 23, 2024

I didn't express myself well. What I meant to say was that Duckdb runs a single process. That simplifies things.

Clickhouse typically runs several processes (server, clients) interacting and that already makes things more complicated (and more powerful!).

That's not to say one is good and the other bad, they're just quite different tools.

PeterCorless · on Oct 22, 2024

Note that every use case is different and YMMV.

https://www.vantage.sh/blog/clickhouse-local-vs-duckdb

hn1986 · on Oct 22, 2024

Great link . Curious how it compares now that Duckdb is 1.0+

theLiminator · on Oct 22, 2024

Not to mention polars, datafusion, etc. Single node OLAP space is really heating up.

fiddlerwoaroof · on Oct 22, 2024

Clickhouse scales from a local tool like Duckdb to a database cluster that can back your reporting applications and other OLAP applications.