If you hit some issues with VictoriaLogs or if you have ideas on how VictoriaLogs usability could be improved, then please file issues at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/ . We appreciate users' input and always try making our products easier to use.
What if the machine is unavailable? It is better to store logs from multiple hosts into a centralized database, so the logs could be investigated even if the original host is no longer available for any reason.
I create a `logs` table in my postgres database where I store important events (user upgraded, downgraded, signed up, etc). I use the filesystem based logs more or less for debugging or tracing specific things.
If the server is unavailable, then my entire product is missing because the entire product is on one server including the database.
200GB of metrics and 1TB of logs can be efficiently processed by a single-node observability solution optimized for metrics/logs (for example, VictoriaMetrics/VictoriaLogs), which runs on a low-end computer such as Raspberry Pi. There is zero sense in complicating the system with clustering, micro-service architecture and object storage for such a small-scale workload :)
VictoriaLogs developer here. I agree with you - it is important to support multiple popular data ingestion protocols at the database for logs, so users could continue using the existing protocol instead of adding an intermediate proxy for converting from one protocol to another protocol.
That's why VictoriaLogs supports multiple data ingestion protocol [1], including JSON line protocol, which can be used for sending JSON logs into it without any conversions [2].
Definitely possible. HTTP direct send is a common pattern that I'm working on an HTTP-specific transport that can be used for these use-cases (vs making a unique impl for each one).
> For example, every damn log message has hundreds of fields,. Most of which never change. Why not push this information once, on service startup an not with every log message?
If the log field doesn't change with every log entry, then good databases for logs (such as VictoriaLogs) compress such a field by 1000x and more times, so its' storage space usage can be ignored, and it doesn't affect query performance in any way.
Storing many fields per every log entry simplifies further analysis of these logs, since you can get all the needed information from a single log entry instead of jumping over big number of interconnected logs. This also improves analysis of logs at scale by filtering and grouping the logs by any subset of numerous fields. Such logs with big number of fields are named "wide events". See the following excellent article about this type of logs - https://jeremymorrell.dev/blog/a-practitioners-guide-to-wide... .
Loki doesn't work well with structured logs and wide events because it has weak support for log fields with many unique values such as trace_id, span_id, user_id, etc. (aka high-cardinality fields). The recommended way to store structured logs with such fields in Loki is to put them into a big JSON and store it as log message. Later this JSON must be parsed at query time in order to apply various filters and aggregations on log fields. Such an approach doesn't scale well, since Loki needs to read all the log messages with all the logs fields encoded inside JSON log messages during query execution. This requires a lot of additional read IO and CPU for reading, unpacking and parsing the log messages. This also worsens data compression at the storage, which slows down query execution even more.
The much better approach is to store data per every log field into column-based storage. This significantly improves query performance, since only the data for the requested columns must be read from the storage, and this per-column data usually has much better compression rate, so it occupies less storage space.
HDD-based persistent disks usually have much lower IO latency comparing to S3 (microseconds vs hundreds of milliseconds). This may help improving query performance a lot.
sc1 HDD-based volumes are cheaper than S3, while st1-based volumes are only 2x more expensive than S3 ( https://aws.amazon.com/ebs/pricing/ ). So there is little economical sense in using S3 over HDD-based persistent volumes.
If you hit some issues with VictoriaLogs or if you have ideas on how VictoriaLogs usability could be improved, then please file issues at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/ . We appreciate users' input and always try making our products easier to use.
reply