OCaml has been pretty common tool to write parsers for many years. Not a bad choice.
I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.
I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.
The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:
The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:
From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.
Thanks for your comment, you've given me a lot to chew on and I think I'll need to bookmark this page.
> I've written parsers professionally with Rust for two companies now
If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
> Especially if you need to do runtime parsing in a network service serving large traffic.
I almost expected something like this, it just makes sense with how the language is positioned. I'm not sure if you've been following cloudflare's pingora blogs but I've found them very interesting because of how they are able to really optimise parts of their networking without looking like a fast-inverse-sqrt.
> There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
> I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that.
Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
> If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
You do not need to write programming languages to need parsers and lexers. My last company was Prisma (https://prisma.io) where we had our own schema definition language, which needed a parser. The first implementation was nested structures and reference counting, which was very buggy and hard to fix. We rewrote it with the index/walker strategy described in my previous comment and got a significant speed boost and the whole codebase became much more stable.
The company I'm working for now is called Grafbase (https://grafbase.com). We aim to be the fastest GraphQL federation platform, which we are in many cases already due to the same design principles. We need to be able to parse GraphQL schemas, and one of our devs wrote a pretty fast library for that (also uses Logos):
And we also need to parse and plan the operation for every request. Here, again, the ID-based model works miracles. It's fast and easy to work with.
> I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
These are suddenly _very annoying_ to work with. If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do. It's also not great for CPU caches if your data is too nested. Keep everything flat and sorted. In the beginning it's a bit more work and thinking, but it scales much better when your project grows.
> Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
You're already on the right path if you're interested in Ocaml. Keep going.
I should've expected prisma! It's actually my main "orm" for my TS web projects, so thanks for that! Also grafbase seems interesting, I've had my fair share of issues with federated apollo servers so it'd be interesting to check out.
> If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do.
You're literally describing my variable environment, eventually I just said fuggit and added a bunch of unsafe code to the core of it just to move past these issues.
There's also the phenomenal pest library. It probably wouldn't be as fast, but I've found that usually parsing a performance critical part of a system. If it is, a manually writing the parser is definitely the way to go.
> Especially if you need to do runtime parsing in a network service serving large traffic
Yeah, that's the focus of it, and the thing you can use Rust well.
All the popular Rust parsing libraries aren't even focused on the use that most people use "parser" to name. They can't support language-parsing at all, but you only discover that after you spent weeks fighting with the type-system to get to the real PL problems.
Rust itself is parsed by a set of specialized libraries that won't generalize to other languages. Everything else is aimed at parsing data structures.
There is also the rust-analyzer which is a separate binary. Should compile with a stable rust compiler. I remember reading it's source together with the zig compiler. Both are quite impressive codebases.
I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.
I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.
The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:
https://crates.io/crates/logos
The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:
Now you can specialize these with type aliases: And implement methods: From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.