The Erlang Runtime System

doctor_phil · on Feb 14, 2024

Past discussions:

https://news.ycombinator.com/item?id=14061985

https://news.ycombinator.com/item?id=23718278

https://news.ycombinator.com/item?id=17003897

doctor_phil · on Feb 14, 2024

I found this book very helpful as a supplement to "Crafting Interpreters". The BEAM has a lot of interesting features that many mainstream languages don't. I found the parts about processes, tagging, memory and reduction counting (chapters 3-5 and 11) especially interesting.

The book is clearly very unfinished still, but what exists is very good. Got my mind spinning on what a good statically typed language running on a VM could look like.

cmdrk · on Feb 14, 2024

agreed! I built a toy language on top of Erlang also using this approach. very instructive.

krab · on Feb 14, 2024

How do the supervisor trees avoid a single point of failure? Let's say I build "etcd in Erlang". How would I tell the cluster that there should be N processes running, each on a different node? And is there anything in OTP that would help me elect a leader or do I still have to implement that myself?

I have no experience with OTP but have read some books and did toy projects.

Kototama · on Feb 14, 2024

> How do the supervisor trees avoid a single point of failure?

They do not. Supervisor trees are a way to manage failures at the level of one node.

> How would I tell the cluster that there should be N processes running, each on a different node?

It's not something that is builtin in OTP. There are libraries that solve it, like libcluster.

> And is there anything in OTP that would help me elect a leader or do I still have to implement that myself?

Not directly in OTP but there are libraries, for example for raft.

Ok so why is Erlang/Elixir/OTP good then? Well first it makes a single running application more robust to failures thanks to its supervision trees but it also allow to build distributed applications more easily. GenServers allow to build robust services very easily with common patterns. Local calls or remote calls to GenServers are the same, allowing to scale services. Message passing and pattern matching is part of the core of the language (no need for protobuf for example). Observability and introspection is excellent when a problem arise (inspecting processes, their memory, their message queues, the schedulers etc). Immutable datastructures and processes that do not share memory also make it easier to scale horizontally, at a cluster level. And probably lot of other good things I forgot :-).

krab · on Feb 14, 2024

Thanks! I kept reading about the magic HA properties of Erlang/OTP. Being familiar with systems built on e.g. Kubernetes, I thought I was missing something because I couldn't find a solution to the aforementioned problems in OTP itself.

What you say makes sense. I can see the benefit in message passing as a first class citizen so it allows extraction of some processes to a different node. But you still have to manage the process placements.

jerf · on Feb 14, 2024

Yes, it's very easy to come away from a casual reading about Erlang and have the impression that it automatically makes your code work across server clusters. I did myself the first time. Some of that is perhaps overexcited advocacy, but some of it is also that when reading about Erlang for the first time you don't quite know what all the terminology is so it's easy to be a bit off.

Erlang does have some tools that make it legitimately easier to write cluster-aware software, such as the message-passing operator not caring about what node the target is on, and transparently handling all network communication from serializing the Erlang term on one side and deserializing it on the other. Erlang's terms, the basic data types, are designed for exactly this use case, and as such a first-class concern, they are good at that particular use case.

However it's still easy to accidentally write some code that will only run on a single node and you still need to be aware of what you're doing to be sure you don't accidentally wire in a hard-coded dependency on being in a single OS process. But when you contrast that to the default state of most other programming languages, which is that they have essentially no concept of cluster communication at all, you can still see how this is an advantage over those other languages.

Kototama · on Feb 14, 2024

I have no experience with Kubernetes but my understanding is that Erlang/OTP is conceptually much simpler but also answer a different problem. Some stuff overlap (like restarting services/apps) but not all: for example there is no autoscaling in Erlang/OTP. I know that some companies use both in their stack. Kubernetes is language agnostic.

Erlang/Elixir is amazing because it brings so much good things and they are not that difficult to learn with some experience with functional programming.

krab · on Feb 14, 2024

Yes, there is definitely some overlap. Kubernetes will help you with the restart and to deliver network packets to the right process. It's also programmable. On the other hand, you have to manually craft the process APIs to communicate and they look very different from the in-process APIs for most languages. You have to think about exposing metrics to some external system that collects them. That and logs are about it for introspection. The rest you have to build yourself.

On the plus side, out of the box it can do things like "run this process N times", "don't run these on the same node", "run this on every node exactly once".

I'd like to get some experience on a real distributed project in Erlang. Mostly in order to be able to assess how big a productivity booster it really is. :-) But it's a chicken and egg problem. Not a hugely popular language <-> fewer developers <-> higher risk for the company.

macintux · on Feb 14, 2024

On the other hand, as has been pointed out in HN ad nauseam, developers who look for jobs in more obscure languages tend to be higher skilled: they've already self-selected for curiosity and willingness to extend their skills.

At Basho, we hired developers to work on Riak who had no Erlang experience. It's a remarkably simple language, syntactically, and the lack of types makes it easier to get up to speed.

(There are, of course, countless other challenges that it brings, where an experienced Erlang person or three can make a big difference.)

gourabmi · on Feb 14, 2024

I have been working in Erlang for almost 5 years now. I want to add my experience as anecdote. I joined the current company because it was solving an interesting problem and just happened to use Erlang. I didn't self select for curiosity or learning a new language. Erlang was the tool available to me do the work that I wanted to do.

Cheers to all you Basho/Riak folks. I have worked with one before!

macintux · on Feb 14, 2024

May I ask where? Always keeping an eye out for companies using Erlang.

Kototama · on Feb 15, 2024

It's not really hard to onboard new engineers with Scala, Clojure or even Ruby backgrounds. The language is easy. It takes a bit of time to learn the OTP patterns but so does learning any frameworks / tech stack too.

depr · on Feb 14, 2024

Erlang/OTP doesn't handle leader election, and by itself is bad at handling netsplits.

There is https://github.com/rabbitmq/ra which is a Raft implementation in Erlang that is Jepsen-tested. You could use it to build "etcd in Erlang", or https://github.com/rabbitmq/khepri which is built on top of Ra.

toast0 · on Feb 14, 2024

> And is there anything in OTP that would help me elect a leader or do I still have to implement that myself?

global:register/3 may be helpful. I haven't used it, so no direct experience. I think you would need to provide the resolution function for when a cluster merges and the name is registered on both partitions, and the logic to register a potential leader if there is none.

From experience with other parts of global, you'll want to be careful and test what happens on your system if a thousand nodes across several locations all try to join/register at once. Especially if one or several of those nodes are running really slow because of hardware issues.

I think some of this might be covered in distributed OTP applications with takeover[1], but where I worked with Erlang, we certainly weren't applying OTP applications as the OTP team intended, I think as a result of most of the team, including all of early server engineers learning Erlang on the job.

[1] https://learnyousomeerlang.com/distributed-otp-applications

dimitrios1 · on Feb 14, 2024

There was something called gen_leader back in the day that came out of Jungerl

and an attempt to correct it by Hans Svensson: https://erlang.org/workshop/2005/NewLeaderElection.pdf

This project attempts to modernize it: https://github.com/lehoff/gen_leader

But from what I can tell, theres no standardized solution. There are quite a few libraries I can see out there, however.

fcoury · on Feb 14, 2024

If you’re intereste in BEAM and, like me, prefer to digest those things on a practical application, Tsoding did a very nice video about it recently:

https://youtu.be/6k_sR6yCvps