We also hit this scalability issue at the default memory allocator for musl. Switching back to glibc allowed increasing the performance in production by 5x on a machine with 96 CPU cores [1].
Lock-free data structures and algorithms access shared memory via various atomic operations such as compare-and-swap and atomic arithmetic. The throughout of these operations do not scale with the number of CPU cores. Contrary, the throughput usually reduces with the growing number of CPU cores because they need more time for synchronizing local per-CPU caches with the main memory. So, lock-free data structures and algorithms do not scale on systems with big number of CPU cores. It is preferred to use "shared nothing" data structures and algorithms instead, where every CPU core processes its own portion of state, which isn't shared among other CPU cores. In this case the local state can be processed from local per-CPU caches at the speed which exceeds the main memory read/write bandwidth and has smaller access latency.
Performance drop for exposing application metrics in Prometheus format is close to zero. These metrics are usually some counters, which are updated atomically in a few nanoseconds. Prometheus scrapes these metrics once per 10-30 seconds, so generating the /metrics response in Prometheus text exposition format doesn't need a lot of CPU, especially if using a library optimized for simplicity and speed like https://github.com/VictoriaMetrics/metrics/
These tricks are essential for every database optimized for metrics / logs / traces. For example, you can read on how VictoriaMetrics can compress production metrics to less than a byte per sample (every sample includes metric name, key=value labels, numeric metric value and metric timestamp with millisecond precision). https://faun.pub/victoriametrics-achieving-better-compressio...
This is called "progress". Humans always generate the amounts of data which can be stored and processed by the tools they have. The more data the tool can process under the given budget limit, the more data will be generated and stored.
There are two tricks used by ClickHouse and similar databases:
- Smart placement of the data on disk, which allows skipping the majority of data and reading only the needed chunks (and these chunks are stored in a compressed form in order to reduce disk read IO usage even more). This includes column-oriented storage and LSM-like trees.
- Brute-force optimizations all over the place, which allow processing the found data at the maximum speed by employing all the compute resources (CPU, RAM, disk IO, network bandwidth) in the most efficient way. For example, ClickHouse can process more than a billion of rows per second per every CPU core, and the scan speed scales linearly with the number of available CPU cores.
Just don't use ClickHouse for OLTP tasks. ClickHouse is an analytical database, which isn't optimized for transactional workloads. Keep calm and use Postgresql for OLTP, and ClickHouse for OLAP.
Properly implemented wide events usually reduce storage costs comparing to typical chaotic logging of everything. It is expected that a single external request leads to exactly one wide event with all the information about this request, which may be needed for further debugging and analytics. See https://jeremymorrell.dev/blog/a-practitioners-guide-to-wide... .
How would you add an outgoing request you make to external system in the wide event?
For example, I receive a request, in that request I make a HTTP call to http://example.com. In tracing that will be a separate span, but how you manage that in a single wide event?
> Why would I use ClickHouse instead of storing log data as json file for historical log data?
There are multiple reasons:
1. Databases optimized for logs (such as ClickHouse or VictoriaLogs) store logs in a compressed form, where values per every log field are grouped and compressed individually (aka column-oriented storage). This results in smaller storage space comparing to plain files with JSON logs, even if they are compressed.
2. Databases optimized for logs perform typical queries at much faster speed comparing to grep over JSON files. Performance gains may be 1000x and more because these databases skip reading unneeded data. See https://chronicles.mad-scientist.club/tales/grepping-logs-re...
3. How are you going to grep 100 petabytes of JSON files? Databases optimized for logs allow querying such amounts of logs because they can scale horizontally by adding more storage nodes and storage space.
[1] https://github.com/VictoriaMetrics/VictoriaLogs/issues/517
reply