Object storage for Kafka? Wouldn't this 10x the latency and cost?
I feel like Kafka is a victim of it's own success, it's excellent for what it was designed, but since the design is simple and elegant, people have been using it for all sorts of things for which it was not designed. And well, of course it's not perfect for these use cases.
It can increase latency (which can be somewhat mitigated though by having a write buffer e.g. on EBS volumes), but it substantially _reduces_ cost: all cross-AZ traffic (which is $$$) is handled by the object storage layer, where it doesn't get charged. This architecture has been tremendously popular recently, championed by Warpstream and also available by Confluent (Freight clusters), AutoMQ, BufStream, etc. The KIP mentioned in the post aims at bringing this back into the upstream open-source Kafka project.
Even if this were to change, using object storage results in a lot of operational simplicity as well compared to managing a bunch of disks. You can easily and quickly scale to zero or scale up to handle bursts in traffic.
An architecture like this also makes it possible to achieve a truly active-active multi-region Kafka cluster that has real SLAs.
> people have been using it for all sorts of things for which it was not designed
Kafka is misused for some weird stuff. I've seen it used as a user database, which makes absolutely no sense. I've also seen it used a "key/value" store, which I can't imagine being efficient as you'd have to scan the entire log.
Part of it seems to stem from "We need somewhere to store X. We already have Kafka, and requesting a database or key/value store is just a bit to much work, so let's stuff it into Kafka".
I had a client ask for a Kafka cluster, when queried about what they'd need it for we got "We don't know yet". Well that's going to make it a bit hard to dimension and tune it correctly. Everyone else used Kafka, so they wanted to use it too.
The weird thing driving this thinking is that cross-AZ network data transfer between EC2 instances on AWS is more expensive than shuffling the same data through S3 (which has free data transfer to/from EC2). It’s just stupid, but that’s how it is.
The core design for producer/broker/consumer certainly is. All the logic is on the ends, broker just makes sure your stream of bytes is available to the consumers. Reliable, scales well, can be used for pretty much any data.
I feel like Kafka is a victim of it's own success, it's excellent for what it was designed, but since the design is simple and elegant, people have been using it for all sorts of things for which it was not designed. And well, of course it's not perfect for these use cases.