Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've regarded RabbitMQ as a secret weapon in plain sight for years. The killer reason people don't use it more is it "doesn't scale" to truly enormous sizes, but for anyone with less than a million users it's great.

Too many people end up with their own half rolled pubsub via things like grpc, and they'd be far better off using this, particularly in the early stages of development.



This is how I feel about NATS: https://nats.io/

It's an infinitely more friendly version of Kafka, pub/sub, etc that is extremely lightweight. Yet every environment trends towards Kafka because it was "chosen" by the big corps.


I've used both and NATS is definitely what I'd pick starting out. RabbitMQ is great too, but its heavier and harder to configure. Federation is the killer feature of RabbitMQ that NATS doesn't really have (afaik?).


I've never used RabbitMQ, but NATS supports clustering, super clustering, and leaf node connections. I'm guessing the latter is the closest to what would be considered "federation" in this context.

Edit: spelling


None of them are close. RabbitMQ has traffic shaping, MQ->MQ routing (federation), and all sorts of stuff that is super important if you use it as a generalized routing system (vs plain IP). NATS doesn't have that and it's almost certainly firmly out of scope.


When you say "MQ->MQ routing (federation)" I have a hard time understanding how that is not close to leaf nodes. Leaf nodes allow independently operated / managed NATS servers to share data back and forth, which to me is what federation is about. Do you have any resources you recommend to help me grasp why this is different?



Oh cool, I didn't know it could! Regardless, advanced routing is still something that RabbitMQ has that isn't in NATS. It's not useful for every system, but it is a differentiator. I usually reach for NATS + Jetstream these days, with its object and KV systems it's a really great "all-in-one" for basic stuff.


Nats also has some basic routing options. Which of RabbitMQ's options would give it the edge when deciding between the two?


NATS JetStream has equivalent functionality to federation out of the box with stream sourcing, combined with super-clustering and leaf-nodes. You can have streams that source and combine messages (with filters and subject transformation) from other sources and that sourcing happens in a 'guaranteed store and forward' manner, even if for example the stream being sourced is located in another cluster in a different region or provider or in an edge device (running a leaf node) that has only partial network connectivity (e.g. on a vehicle) and completely different security setup.

(for more detail check out this and the other YouTube videos on the Synadia channel https://youtu.be/WH55czo1BNk)


Big corps choose these bloated and complex tools because this way they can justify charging their customers premium and also that their customers won't be able to service such software on their own.

It's nothing to do with these apps being superior in any way - often it is the opposite - like reasonable faults and glitches can add more hours to bill the customers for "fixing".


nats looks more like a more friendly version of rabbitmq or zeromq than of kafka from these docs: https://docs.nats.io/reference/reference-protocols/nats-prot...


So how about NATS compared to RabbitMQ? If building from scratch, what would drive a design or team towards NATS?


Two different models. The metaphor I like to use is that RabbitMQ is a postal system, while NATS is a switchboard.

RabbitMQ is a "classical" message broker. It routes messages between queues. Messages are treated like little letters that fly everywhere. They're filed in different places, and consumers come by and pick them up.

Core NATS isn't really a message broker, but more of a network transport. There are no queues as such, but rather topologies of routes where messages are matched from producers and consumers through a "subject". You don't "create a queue"; you announce interest in a subject (which is a kind of path that can contain wildcards, e.g. "ORDERS.us.nike"), and NATS routes stuff according to the interests. So there's nothing on disk; and if a consumer isn't there to receive a message, the message is gone. Thus you can send messages back and forth, both point-to-point or one-to-many. NATS itself isn't reliable, but you can build reliable systems on NATS.

A common example of the lightweight, ephemeral nature of NATS is the request-reply pattern. You send out a message and you tag it with a unique reply address, the "inbox subject". The subject is just a random string (it may be called "INBOX.8pi87kjwi"). The recipient replies by sending its reply to that inbox. The inbox isn't something that exists; it's just a subject temporarily being routed on. So the sender sends a message and waits for the reply. NATS encourages you to use these ephemeral subjects as much as possible, and there can be millions of them. You can do RPC between apps, and that's a popular use of NATS.

JetStream is a subsystem built on core NATS, and is what you get when the designer of NATS thinks he can outsmart the designers of Kafka. JetStream is basically a database. Each stream is a persistent sequential array of messages, similar to Kafka topics or a RabbitMQ queue. A stream can be replicated as well as mirrored; one stream can route into another, so you can have networks of streams feeding into bigger rivers. Unlike core NATS, but similar to RabbitMQ, streams and their consumers have to be created and destroyed, as they are persistent, replicated objects that survive restarts.

Similar to Kafka, streams are just indexed arrays; you can use it for ephemeral events, or you can store long histories of stuff. Consumers can go back in time and "seek" through the stream. Streams are indexed by subject, so you can mix lots of types of data in a single stream (as opposed to multiple streams) and simply filter by subject; NATS is very efficient at using the index to filter. Like RabbitMQ but unlike Kafka, streams don't need to be consumed in order; you can nack (!) messages, or set an ack timeout, causing redelivery if acks aren't sent in time. In other words, JetStream can work like Kafka (where you always read by position) or like RabbitMQ (where messages are skipped once acked, but retried once nacked). JetStream has deduplication and idempotency, which allows you to build "exactly once" delivery, which is awesome.

Similar to how someone built a database on top Kafka (KSQL), the NATS team has built a key-value store on JetStream, as well as a blob store. They work the same way, through message ID deduplication. A stream is basically a bunch of database rows, with the message ID acting as primary key. So the stream acts as a primitive to build new, novel things on top of.

I think it's fair to say that RabbitMQ gives you opinionated tools to do certain things, whereas NATS and JetStream are a hybrid "multi model" system that can be used for more purposes. For example, you can embed NATS in your app and use it as a really lightweight RPC mechanism. You can use JetStream as a classic "work queue" where each worker gets a single copy of each message and has to ack/nack so the queue moves forward. You can use JetStream as a log of all actions taken in a system, with retention going back years. (NATS/JS is actually awesome for logging.) And so on.

We use NATS for different use cases at my company. In one use case, clients connect to an API to follow live, low-latency change events. For each such client connection, we register a NATS subject; then tons of processes will see this subject (and its settings, such as filters) and will all start send changes to that one subject. There's no single "controller"; it's all based on point-to-point and one-to-many communication.

(Full disclosure: I'm not familiar with newer RabbitMQ versions or the streaming stuff they've added, so it's possible that RabbitMQ has caught up here in some ways.)


Thank you, kind internet stranger. That was an awesome dive into the matter!


I'm late replying to this comment but NATS sounds interesting. Given this comment:

>We use NATS for different use cases at my company. In one use case, clients connect to an API to follow live, low-latency change events.

Do you require any form of guaranteed delivery for these events? And what do you consider low latency?


Guaranteed delivery, yes. Latency is in the low millisecond range. Very happy with performance and stability.


Thanks


Sorry, I'm not familiar with this tech so looked up Jetstream and it seems to be archived https://github.com/nats-io/jetstream. Not sure that would be a good endorsement to try to use something that is no longer maintained or am I looking at the wrong one?


> JetStream went Generally Available in NATS 2.2.0 the documentation is now in the core NATS documentation.

https://docs.nats.io/jetstream


JetStream was originally designed/built separately, but is now built into NATS. So that repo is obsolete. NATS with JetStream is here: https://github.com/nats-io/nats-server.


Great, thatnk you


RabbitMQ shines when you need complex queue routing based on keys or headers. Instead of baking the logic to the app you can offload the routing logic to rabbitMQ. Same is true for NATS and Kafka.

I must say that rmq in k8s, is possible but hard to admin. It’s not a “toy”. But has great documentation. Will take several iterations to key in the right configuration for the use case.

NATS and Kafka can handle higher volumes on the same resources but IMO the use cases are different or you have to write lots of code app side to implement what rmq does with these tools.


Have used both extensively:

NATS, especially in its most elementary form, is braindead simple, stateless, and ootb functional. If I only want a message broker/pubsub, I pick that. If I know on day 1 that I need queues or persistence, I would probably pick Rabbit over NATS’ offering (Jetstream).


Would you use NATS over Redis PubSub or Postgres Notify/Listen? Postgres's option I'm wary of for reasons I've discussed here before but Redis PubSub seems fairly simple to use and likewise not as uber-scalable as Kafka and other more heavyweight message broker systems.


Not the parent, but I would. We've been using NATS for about five years at my company, and we recently adopted JetStream, and have been really impressed with it.

NATS, especially with JetStream now, is a Swiss Army knife of messaging. It can do RPC, Kafka-type batch streaming, low-latency realtime notifications, large-scale network transfer, offline sync, work queues, mirroring, weighted routing, partitioning... it's incredibly flexible. The simple but powerful primitive you have around subject routing/mapping/transforms, stream placement, replication, offline lead nodes etc. are just fantastic. It's super reliable. It scales from tiny apps to huge ones. The CLI and API tooling around streams and consumers is also fantastic.

Everything just feels well-designed and sensible. The new key/value store and blob store (both built on top of JetStream as client-side abstractions) are also super neat, and we've started using this functionality.


Do you run jetstream on k8s? I’m curious if blob/kv is more reliable than redis sentinel on single node deployments. Lots of quorum issues with redis


Yes, and no issues at all. We run superclusters with a bunch of nodes, and the Raft-based leader election system has worked flawlessly so far (knock on wood!).

Keep in mind that NATS does not yet support value operations other than setting the whole value. Optimistic locking is supported, but NATS does not have inc/decrement, append, set members, etc. I believe such support is on the horizon, however.


Thanks for this, I’ll have a look. I use redis largely for memorization but have had a lot of issues running it within k8s for whatever reason


I would probably use NATS just because I’m more familiar. I love redis but I’m fairly skeptical when my database starts wanting to be my message broker


What does Jetstream lack wrt queues/persistence?


It doesn’t lack anything in particular, it’s just heavier weight compared to Rabbit. I would start with amqp and iterate towards NATS with scale


I agree. NATS is much more simplistic to use and deploy. Easy to run locally for development. Jetstream offers useful features like persistent streams, and kv and object stores.


https://nsq.io/ is also very reliable, stable, lightweight, and easy to use.


Maybe because in typical developer fashion their website is absolutely dog-crap at explaining what it does, instead only throwing words here and there

"Key value", "pub-sub", "microservices can live anywhere" And?

Why do I care? What does "microservices can live anywhere" even mean and what does it have to do with NATS?

I find RabbitMQ sometimes inscrutable but I think even their website is better


Modern RabbitMQ scales well. RabbitMQ Streams provide a throughput of millions of messages per second. With native support for MQTT, more than a million users can connect to RabbitMQ: https://www.rabbitmq.com/blog/2023/03/21/native-mqtt

The new Khepri metadata store in RabbitMQ 4.0 allows for even better scalability.


Is there a big advantage over Redis? Just coming from the Ruby world I know Sidekiq is extremely popular as a Redis backed queue. I know there's a RabbitMQ backed queue called Sneakers that's gained a lot of popularity as well though.

Just wondering what the biggest selling points are for somebody making a decision?


We switched from a redis backed queue (bullmq) to rabbit at our company (https://github.com/laudspeaker/laudspeaker) when we hit speed issues . We found bull had a much lower throughput, after a lot of tuning compared even to rabbit out of the box.

We needed features like job priorities, and complex routing logic and on top of that our software needs to send millions of messages a minute. What we miss is a nice UI to see the state of jobs, and monitor for problems more easily like: https://taskforce.sh/


At my current place, using Redis with celery is becoming bottleneck for number of concurrent connections it can hold. We are using 1000 workers and start seeing issues (ceiling is 10k connections in Redis); apparently Celery creates huge number of connections. We are considering moving to RabbitMQ for same reason.


Celery is an over engineered pile.. Rather move lower down the stack, with your first step being Kombu which powers celery under the hood. It's oddly "configurable" so if you need to optimize and adjust connections this is where you should go, and it's pretty interchangeable between Redis and AMQP. Really, almost a drop in replacement that should just be a config change.


Been a while since I last used, but IIRC Rabbit is much more featureful than Redis, and has built-in much of what you'd get from Redis + Sidekiq. Their concepts[1] docs provides a good overview. This may be too broad a stroke but what I liked best about it was we used it for our centralized pub sub system, and although we had services in multiple languages / frameworks, they connected to each other through Rabbit, which saved us from having to setup a bunch of different / incompatible job processing systems. Like Redis its battle tested so you know what you are getting complexity / scale wise. At my current gig we use the typical Rails / Sidekiq setup and though it works fine I definitely find myself missing Rabbit. (But in my head Redis / Rabbit have only some overlap, and seeing both at the same company would seem totally normal)

[1]: https://www.rabbitmq.com/tutorials/amqp-concepts


Redis is slow and doesn't scale out well. Couchbase used to be a decent choice but it went commercial pseudo-FOSS.


I think people don't use it more because people don't really know what it is. From their website:

> RabbitMQ is a reliable and mature messaging and streaming broker, which is easy to deploy on cloud environments, on-premises, and on your local machine.

What does that mean? "Messaging and streaming broker"? I understand the need for worker queues to process videos, images, emails and such but I can't easily tell if that's what this is.

Also, what are the benefits of this over just processing incomplete records straight out of my database? i.e. using MySQL as a queue.


> what are the benefits of this over just processing incomplete records straight out of my database? i.e. using MySQL as a queue

Mainly throughput and latency. I haven’t used MySQL recently so some of this may apply more to Postgres.

Postgres has LISTEN/NOTIFY which helps with latency. I don’t think MySQL has LISTEN/NOTIFY, which means you’d have to resort to polling.

You have to use the `SELECT … FOR UPDATE SKIP LOCKED LIMIT 1` features to grab a message from the queue table, so multiple consumers don’t pull the same message.

The biggest issue, if you’re trying to achieve decent throughput, is dealing with bloat (empty pages still sitting on the disk that haven’t been cleaned up yet). You can run vacuum but an online vacuum will only mark the pages as available for reuse (doesn’t free up the disk space). And if you run a full vacuum (which will free the disk space) it locks the entire database while it runs the vacuum. This can compound if you’re using indexes.

One way of dealing with this is setting up partitioning by message timestamp, so that as old messages roll out, you can just drop those partitions and not have to deal with vacuum.

It can work if your queue needs are low throughput or can tolerate higher latency, but there are some things to manage, and realistically setting up a Redis instance is probably less complex than trying to work around the database-specific quirks, unless you’re already very familiar with your database’s inner workings.


Also want to give a shoutout to BeanStalkd: https://github.com/beanstalkd/beanstalkd

If you are looking at RabbitMQ with "Maybe this is too much". Beanstalkd likely has features you need with almost none of the setup. Just don't web expose it ;)


Be careful with it, it will segfault randomly and there hasn't been a fix. After hitting my own posts on the Google groups while sleepily debugging the segfault at wee hours of the morning and getting falsely excited about the possibility of a fix, I gave up and wrote a replacement: https://github.com/chronomq/chronomq Have been running it for years without it falling over, submillisecond operations on average, and has processed billions of messages without failing.


It's the polar opposite of RabbitMQ. It's a single binary written in C, you start it, you send text messages via TCP so writing a client or tooling is dead simple.


It's supported natively by Rails ActiveJob too!

I only used it on one project years ago and it was a pleasure, dead easy to get up and running and rock solid.


The real reason people don't use it is because they don't know about it or understand it. Then they apply the "it doesn't scale" retroactively.

You have to read a lot of docs or you WILL hold RabbitMQ wrong.


I agree, but there are a lot of footguns with RMQ. A great example of one is that you'll slow down your cluster by adding more RMQ servers (something that's bit us in the past). Which is a forgivable mistake as most people would expect that more cores == faster RMQ. (For RMQ, that doesn't work because Durable messages need to be replicated to the other nodes in the cluster. More nodes == more replication)

The ideal RMQ cluster has 3 servers and is dedicated to just a few apps.


That was true, but the more recent Quorum queues provide more traditional scalability tradeoffs (you can set a ceiling on the number of synchronous replication hops that a published, replicated message goes through): https://www.rabbitmq.com/docs/quorum-queues


The most common elephant foot gun in the room is buggy processes letting queues grow.

RMQ immediately slows down (due to mnesia causing delays) and processes start dropping messages despite having system resources to grow.


can you elaborate on the details? I have some memories about running OpenStack where Rabbit "was slow", but we never figured out why. mnesia is the storage layer?


Yes, it was using mnesia as the storage layer, and if I had a few dozen queues with a few hundred messages each, it caused timeouts in some clients (celery/kombu is an example).

I decided to add expiry policies to each queue so that the system cleans itself from stale messages and that fixed all the message dropping issues.

4.0 Changelogs state that they are switching to a new k/v storage (switching from experimental to default)


Thanks for the details!

Yep, similar symptoms. (OpenStack's services are also written in Python, or at least were back then, so probably similar to Celery.) We had regular problems with RMQ restarting. (Unfortunately I can't recall if it was for OOM or just some BEAM timeout.)

A few hundred messages in a few dozen queues seem ... inconsequential. I mean whatever on-disk / in-memory data structure mnesia has should be able to handle ~100K stale messages ... but, well, of course there's a reason they switched to a new storage component :)


Mnesia is _not_ the storage layer for messages (except for delayed messages).

Mnesia stores vhosts, users, permissions, queue definitions and more. This is being transitioned to Khepri, which improves a lot of things (maybe most importantly netsplits) but not directly message speeds.


Agreed. I ran a log ingestion infra 8 years ago doing 20k msg/s sustained on RabbitMQ ... back then we went through a lot of instabilities though they settled over time with new releases. Great times. Besides a quality product the development/release process was very professional and mature.

The biggest issue back then was finding a quality client implementation for the language you were using. Not sure what the status of that is these days.


AMQP and MQTT are both industry standard protocols. Also, RabbitMQ allows you to abuse the limits set by these standards.

Its unfortunate your team ran into performance issues, as Erlang can be inefficient in some situations. Were you using static routes on the DNS load balanced parallel consumers, or relying on the disk caching mechanisms?


It isn't more popular because it's not easy to use it properly.

I haven't touched it in years so I can't expand, but when I did, I had to write so many wrappers and add extra logic to use it properly.


I have found it to be more trouble than it is worth, as well


For me the killer feature of Kafka is that topics are persistent until the data expires. Meaning different readers can be working at different offsets. And you can rewind or fast forward the offset you are reading, which can really be a life saver when things go sideways.

Does RabbitMQ have equivalent features?


Yes, a stream queue type [0] is available where you can set retention, and replay messages.

[0] https://www.rabbitmq.com/stream.html


Erlang is a secret weapon.


RabbiMQ also graciously maintains a very nice Erlang repository for Debian.

Reminds me, I'll have to check if they have a working donation link someplace. =3


Probably better said as "BEAM is a secret weapon".

BEAM languages (including Elixir and Gleam) share the benefits Erlang enjoys by also being part of the ecosystem.


Do you have any recommended resources to learn how to apply these tools (RabbitMQ, Nats, etc) to typical enterprise services? Common patterns/usecases/best practices and things?


The anti-pattern to be avoided is cobbling together a nonperformant grand centralized ESB and making it a SPoF and bottleneck for everything, but it depends entirely on the use-case. MQTT scales to millions of devices for low data rates. ZK works well for little bits of coherent distributed cluster metadata. Kafka has its niches. ZMQ helps in others.


I've seen it used for a company with way, way in excess of a million users. We used it for a system with 100M+ users for our login systems and in general all of our main account systems relied on it. Most of the brokers were always running at 15k to 25k messages per second.

I loved it and the only issues we had were due to our fuckups.


How do you manage schema for exchanges, queues, and routing? The pattern seems to be for each client to set up the schema for the parts they touch before hand, but that doesn't scale well. The schema ends up siloed in each project and nobody knows what the current state should be.


Clients creating server topologies does scale very well with the new Khepri metadata store in RabbitMQ 4.0. Applications having an intimate relationship with their middleware (RabbitMQ) and creating flexible routing topologies is one of main strengths of RabbitMQ! RabbitMQ also allows operators to import (exchange, queue, and binding) definitions on boot. Server topologies can nowadays even be declared via YAML files in Kubernetes (https://github.com/rabbitmq/messaging-topology-operator). This way all the desired state is in one single place and the Kubernetes operator reconciles such that this declaratively declared desired schema is created within RabbitMQ.


Probably using it wrong if complaining about AMQP queue scale limits...

Perhaps people are still thinking in single point of ingress design paradigms. Admittedly RabbitMQ can be a pain to administer for Juniors, but then again so are the other options. =3


If you like RabbitMQ, check out NATS!

I can’t speak to the new version, but it comes with support for even more messaging patterns out of the box.


I know this is said a lot about things people don't like or think doesn't scale but I think I a lot of people don't set it up and use properly and it doesn't scale doing their half-baked implementation.


I could say the same thing about NSQ which is a distributed message queue with very simple semantics and a great HTTP API for message publishing and control actions. What it doesn't offer natively is HA though.


People will be surprised on how far you can get NSQ. It doesn't come with any fancy guarantee like only-once or even ordered, this forced developer to think how to design better on the application side. Not saying it's ideal tho.


I don't know why / how messages should be ordered. NSQ is a message queue and not a log. Some messages take longer to process than others, and some messages need to be re-queued and re-tried out of order, and that is a very common use-case.

I would love to be able to use a distributed log like Kafka/Redpanda since it's HA out of the box, but it simply does not fit that use-case.


If you’re not worried about scale, I’d just use a database backed queue.


Do you know of any book or video to get started with Rabbit?

Earlier this year I tried setting it up as a websockets message broker in a Spring Boot app but failed miserably. I ended up using Spring’s simple broker.


RabbitMQ in Depth https://www.manning.com/books/rabbitmq-in-depth The author is hanging out here, too.


whut.. transferred TB of data per hour with Rabbit.. does it not scale anymore?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: