I actually build minimal Redis clones in every new language or runtime, or when I want to explore threading models.
It all started with https://github.com/rcarmo/miniredis (which I forked to add and experiment with pub/sub), and I just found myself doing it again and again because Redis is the quintessential network service:
By implementing it, you learn about socket handling, event loop for a specific runtime, threading models, data representation, concurrency (if you want to do a multi-threaded version), etc. None of my "ports" are fully functional, but they all helped me sort out some of the above plus build tools, packaging, dependencies, etc.
It's "hello world" for core cloud native microservices, if you will (and without having to do REST or JSON stuff).
I once built one in Swift 4, in part to see how multithreading via SwiftNIO and using copy-on-write datastructures would compare. It held up well against the C implementation, would be worth trying again against Swift 5.x.
https://github.com/NozeIO/redi-s
Shameless plug; I've been playing with a [Redis Server implementation in Go](https://github.com/jan-carreras/ddia) for the past weeks. Mainly as a way to try out things explained in Designing Data-Intensive Applications book (favourite of mine!). Those are the [commands implemented](https://github.com/jan-carreras/ddia/blob/master/commands.md), + TTL + AoF files (for state replication) + config file, ... The "challenge" was to do it without any external dependency other than go stdlib.
> I actually build minimal Redis clones in every new language or runtime, or when I want to explore threading models.
100% agree with your advice; I'll definitively try to implement other parts of the Redis service in Go (eg: pub/sub, replication, clustering...) and probably repeat the same exercise when learning any new language.
I saw this "Build Your Own Text Editor"[0] on HN a month or two ago, everyone was raving about it so I went through it and it really was fantastic. The learning experience was unparalleled. I'm a believer in the idea of "Build Your Own..." guides now, I hope this Redis guide is just as good as the kilo text editor. I'm definitely bookmarking this for a deep dive when the time is right. Any other top notch "Build Your Own" recommendations would be highly appreciated.
While I applaud the initiative and effort, I want to urge the author to consult a good proofreader. I’m only in chapter 2 yet I’ve already encountered enough friction in the form of poor English to put me off. I think the substance looks great, so it’s a bit wasteful to not walk the last mile and make it genuinely enjoyable to read by eliminating blatant errors like this one:
“The Redis is an example of the server/client system.”
Sigh. I had to actually go look this up. Sometimes it's just easier not to talk, because all of the skeletons of old normalized words come to light. I truly was not trying to slur anyone, and apologize if my (unlucky) attempt at humor resonated sour for anyone.
I propose adding some Unicode sign like the double turnstile [1], ⊨ (ALT+8875), which can signify "is a semantic consequence of" or "is stronger than", after a polysemic word to imply that no racial/socially derogatory connotation is to be derived from the word in that phrase, for example, "the text reads too auto-jipped⊨" or "the master⊨ branch is fubar" in a Git context.
Or perhaps an even better sign would be the negated double turnstile, ⊭, "to denote the statement 'does not entail'" [1], making it more explicit. Hence an example would look like "the text reads too auto-jipped(⊭racism)", which can be read "the text reads too auto-jipped and this word, auto-jipped, does not entail racism in this context". Ok, done, racism solved, your move David Guetta [2].
ChatGPT does not know how to proof read. It is not an artificial inteligence. It will take your input and determine the most likely response matching the data it was trained with.
I’ve learned over the last few
months that there are certain trigger words that instigate basically pointless discussion around e.g. “what it means to learn”.
I can now add “read” to that list. Let’s play word taboo! The rules are we can’t talk about GPT using anthropomorphic terminology.
Does GPT predict less than useful mathematical computations? Yes, and not just less than useful but basically useless.
Does GPT predict less than useful language translations, ranging from English-French, to summaries, in-the-style-of, etc? No, it’s actually quite useful as when confined to only the information contained in a prompt it doesn’t have to hallucinate an answer.
It is not useful to anthropomorphize the functionality of these tools in either a practical or legal context.
And everyone pick up a copy of Philosophical Investigations by Wittgenstein so y’all can learn about how to avoid snake-eating-tail discourse.
For example, here's a section from the Build Your Own Redis book:
Our server will be able to process multiple requests from a client, to do that we need to implement some sort of “protocol”, at least to split requests apart from the TCP byte stream. The easiest way to split requests apart is by declaring how long the request is at the beginning of the request. Let’s use the following scheme.
The protocol consists of 2 parts: a 4-byte little-endian integer indicating the length of the following request, and a variable length request.
Starts from the code from the last chapter, the loop of the server is modified to handle multiple requests:
GPT suggested this instead:
Our server will process multiple requests from a client by implementing a protocol to separate requests from the TCP byte stream. The simplest method for separating requests is to include the length of each request at the start. The protocol consists of two parts: a 4-byte little-endian integer indicating the length of the request and a variable-length request. The server code from the previous chapter has been modified to handle multiple requests in the following manner:
There are no hallucinated facts because the most probable continuation of the given prompts is one that can gather all required information from the original text itself.
It's sort of like the difference between the truthfulness of analytic and synthetic claims. An analytic claim would be like "It is raining and you're outside, naked, and unsheltered so therefor water is falling on your skin from the sky." A synthetic claim would be like "It is raining outside".
Synthetic claims are said to be contingent on facts outside the text itself. These are the cases where GPT is completely useless.
The error rate for analytic claims is much lower although anyone who is writing anything should do a lot of review before publishing. Think of it like you asked your assistant to write something. You're gonna wanna read it over before you slap your name on it.
I mean, I actually don't care if you use these tools or not but your explanation of how it works will guide other readers in the wrong direction so I feel the need to correct the narrative you've presented.
Me: Hey, I need some help cleaning up some language with a lot of grammatical errors, can you help?
GPT: Yes, I'd be happy to help you clean up language with grammatical errors. Please provide the text for me to review.
Me: Our server will be able to process multiple requests from a client, to do that we need to implement some sort of “protocol”, at least to split requests apart from the TCP byte stream. The easiest way to split requests apart is by declaring how long the request is at the beginning of the request. Let’s use the following scheme.
The protocol consists of 2 parts: a 4-byte little-endian integer indicating the length of the following request, and a variable length request.
Starts from the code from the last chapter, the loop of the server is modified to handle multiple requests:
GPT: "Our server will process multiple requests from a client by implementing a protocol to separate requests from the TCP byte stream. The simplest method for separating requests is to include the length of each request at the start. The protocol consists of two parts: a 4-byte little-endian integer indicating the length of the request and a variable-length request. The server code from the previous chapter has been modified to handle multiple requests in the following manner:"
---
That's the entirety of the interaction!
I haven't tested much but for the last day or so I've been thinking a lot about Kant, Frege, Quine and Wittgenstein!
GPT opens the door for some kind of empirical philosophy... like, what are the error rates for various kinds of tasks? Can we use a Kantian framework? How about Frege? How about Quine?
I mean, Quine is actually my favorite of the analytic philosophers because of his indeterminacy of translation argument and the notion that there really is no analytic/synthetic divide when you get down to it resonates well with me.
Death to metaphysics!
But there seems to be some use in differentiating between "All bachelors are unmarried" and "All bachelors are unhappy" if only because I'm now seeing how making a distinction can have a profound impact on the usefulness of GPT completions.
Briefly and half-assed, Quine's argument is that because you would have to be familiar with language and culture in order to understand "All bachelors are married" that the meanings of those words are fact-like and outside the scope of the proposition.
If GPT is able to do some Frege-like substitution of synonyms it is because it has this compressed language model which seems to lend credence to Quine's arguments.
I find the quality of answers you receive out of GPT drastically changes with the way you phrase questions.
I don't think I would ever have come up with asking the question in the way you did.
As someone who has english as a second language I found GPT ofthen produced incorrect and low quality answers while most of my native english speaking colleagues were getting high quality answers. Looking at their prompts compared to mine it's all down to differences in how questions are phrased.
When I see poorly written prose, it makes me distrust the technology imparted. If you can't take time to get the grammar right, why should I believe that you got the code right?
There are many signals you can use to help you judge the value of a written piece of text. In this case, it reads to me much more as the writing of someone for whom English is not their native language than of someone who doesn't know what they're talking about technically.
If you want to pre-judge all technical content coming from people who have not spent huge portions of their life living in an English speaking country as being of no value, then I'm sure that will protect you from some bad content, but you're going to be missing out on an awful lot of good stuff too.
Potentially, but on the other hand, James Smith is exactly the sort of name you'd choose to use in the Anglosphere if your birth name was difficult to pronounce for English speakers.
I am absolutely with you. There’s a certain large company I work for who has public documentation full of stuff like this. It makes me insane when I find it so glaring. There’s an internal program to proofread and review, but quantity is incentivized, not quality. I’ve tried to shove that program in a better direction to no avail.
I am very happy to see more content like this (attempting to teach how to create a real system) becoming available.
Few month ago there was a launch announcement of YC backed startup that sell you walk-through labs guiding you to build Redis, Docker, Git and others. (Can't find link to it now.)
What is still missing, in my opinion, and is badly needed, is content or even an idea on how to teach taking such projects from toy prototype version to the production quality one.
You're probably thinking of https://codecrafters.io/. I've been meaning to check them out with my education stipend from work, just haven't had the time to devote to it.
$83/mo or around $994 annually is crazy way too much for me. I hope them the best because I love that kind of content and want it to be more popular, but I will wait until a better priced option comes.
There's* a free plan allowing a stage daily. With current content (all languages, all stages) someone can do everything in <9mo. But challenges are same across languages. If only want to do challenges ("build X clone") once in one language can do it in <1.5mo.
*Saw in sibling comment this is recent addition. Without it, yeah, it may be a bit expensive for one person.
Please consider similar pricing structure as JetBrains. If you pay for it, the price is X, but if your company pays for it, or you're somehow reimbursed for it, the price is 5X.
Correct. For many of our users, doing one stage per day is just the right amount of exercise (especially on the later stages — as it gets more challenging)
Not to by hyper critical but that is just not enough to support learning for more ambitious people. There are too many great learning communities that are way cheaper. I really hope you guys revise the pricing structure. At a $20mo price point, I know a ton of people that would buy it. At the current price point, well, it’s just people with a learning stipend at big corp that don’t like to buy books.
I think this is a bit much. I wouldn't buy it, but I think this is deep technical content that you might not stick on forever, and they need to pick a price for it. If it's too high or low they'll find out soon. No need for any other inputs than that.
The idea is very good - the tests approach is effective and stimulating; they also provide a range of products to study/experiment with. However, I think that the approach to the learning material is "love it or hate it".
By design, the service doesn't provide any documentation; it provides references to existing technical documentation (of any kind, including blog posts).
Those who expect a focused introduction to each topic will find it very tedious or hard to proceed (for example, the SQLite exercise has important details buried in a very large and confusing webpage), and likely hate it; those who like the challenge of understanding loads of raw documentation will love it.
It's more complex than that. People may need stimulation in order to follow through a certain learning process.
Providing a structured path and (automated) test means can be stimulating, and can make the difference between (deciding to) learning something or not.
Some people are certainly entirely autonomous, but at the very least, there is a spectrum of need for stimulation when approaching a topic of study.
AFAIK there are no other services that provide multiple languages, automated testing and team features, but if you know of any, it's certainly useful to report them in this thread.
Exercism is the closer service I can think of, but it's based on simple exercises, not real-world projects.
There are a few books that have a similar target (build X), but much narrower in scope (either a single language or pseudocode, and certainly no automated testing/team features).
I have finished their build your own Redis exercise. It is very well organized and there are hints/discord groups in which you get sufficient help if you make the effort.
This model is great for someone who loses patience with all the groundwork setup.
However, I do agree that the next leg is taking this MVP/Prototype level to production and ideally sell it as a real alternative to the commercial version of Redis.
Sorry for being a little off topic, but the Mini Redis tutorial [1] was really fun when I did it a few years ago. It has you implement a server and client in Rust using the Tokio library.
I think Redis is a great server to “build yourself” as you don’t need to start with much to get it going.
If anyone is looking at this stuff and thinking it’s intimidating then think again: Redis is mostly a lot of fuss around a hash map. There are cool features sure but the idea is mainly “what if a hash map were accessible from two or more servers”. If you only have one or two servers, you only want a cache, and you don’t anticipate scaling up soon (be honest) you can just use a hash map with expiry checking and not have to learn how to deploy another piece of software.
Redis offeres alot more then just a key value store. The fuss is the data structures it provides.
And just using a hash map in memory isn't sufficient. You'll have unbounded growth. You need a max size and need to evict. Then you want to do that efficiently so your not wasting space on useless keys...
IIRC there’s a nice abstraction in .NET Core for caching, which has a swappable backend that means you can upgrade from MemoryCache to Redis if you ever need it. If only all software was this good. One of the least-leaky abstractions of all time.
It is nice to see more on "build your own database" topics.
Although people constantly bug me on attempts to build their own ClickHouse. Someone is trying to do it with Apache Arrow and DataFusion. Folks from DuckDB are trying to build their own crippled version of ClickHouse. Friends from China doing it with their Apache Doris. InfluxDB is being rewritten to be closer to ClickHouse in an attempt to make it better and so on...
Incase you're unaware, InfluxDB IOx (the rewrite) is built on Apache Arrow and Datafusion, and is the primary driver behind the project.
I also think calling DuckDB a crippled version of Clickhouse disingenuous. That's like calling SQLite a cripped version of postgres. They have very different goals.
That's also a bit like calling Clickhouse a build-your-own vectorwise/MonetDB because they did it first.
Maybe I'm missing something but the server application resulting from the source code in this book does not actually work with redis-cli or any Redis clients. It appears to be using a custom binary protocol, not valid RESP.
I feel that the RESP protocol itself is the most challenging task of building ones own Redis.
> The end result is a mini Redis alike with only about 1200 lines of code. 1200 LoC seems low, but it illustrates many important aspects the book attempts to cover.
The techniques and approaches used in the book are not exactly the same as the real Redis. Some are intentionally simplified, and some are chosen to illustrate a general topic. Readers can learn even more by comparing different approaches.
I wouldn't emphasize the importance is understanding Redis per-se but the ideas around a system like Redis.
What? I genuinely think RESP is the easy part. I used to use your redcon library to get bootstrapped but then rewrote it from scratch and it was really easy. Unfortunately Golang doesnt permit good abstraction sp i have to write the same thing over and over again.
I am a big fan of Build Your Own X educational projects. Recently, I released a Go version of my build your own KV Store project. I have set up this project in TDD fashion with the tests. So, you start with simple functions, pass the tests, and the difficulty level goes up. There are hints if you get stuck (e.g. link). When all the tests pass, you will have written a persistent key-value store.
Is there a public repo we could send PRs/MRs/patches to? The prose has room for improvement and since the book is free, I'd love to send a couple of PRs your way.
This take on "building blocks of modern computing" is fairly opinionated though.
They mention "NGINX, SQLite, PostgreSQL, Kafka, Linux kernel, etc." - of which I'd only consider NGINX and the Linux kernel (and Redis) as "building blocks". The others might be part of their own preferred stack, but if you mention Postgres, why not MySQL? If Kafka, why not RabbitMQ?
But yeah, NGINX, Redis, and the Linux kernel are basically outside of discussion.
They are all arbitrary (including nginx) in the sense that there are many stacks out there that don't use any of them. But I don't think it's a controversial selection as they are all extremely popular.
i love this! is there a book like this to learn rust with? it would be amazing to have a working system at the end. (not looking for a beginner, but say, intermediate level book).
Purchased this via Gumroad, but the epub version seems to not be formatted correctly. After about page 3 it decides the book is finished on my Kobo. Hopefully the author will get a fix out, I'm pretty excited to read it and pdfs are pretty hard to read on my device.
k2pdfopt actually has an option for my model, cropping some of the excess margins helps a lot, but I will say epubcheck found some invalid markup in there, so it could be related to that.
Looks neat and relatively self contained. I’m adding this to my list of projects to use to learn a new language. I might give it a shot in Kotlin and Zig (I know it and use it, but am not as familiar with its event loop implementation as I’d like to be) over the next few weeks.
One of my pet peeves when I was learning programming is that I couldn't find any books that would teach how to write tests in real world.
There was that chicken and egg problem then, when potential employers would skip my CV because I didn't have any TDD experience and I couldn't find anywhere how to learn this.
Before I even learned that something like testing exist, I was so confident like "oh I could write something like that over the weekend, how come they needed months to do that?". Then my software would crash the first time someone other than me used it.
Anyway - what I want to say that while this book sounds like a great idea, without showing TDD and how to write code so that it can be proven it works the way intended and that it can handle unhappy paths and edge cases, it won't teach someone trying to learn programming much and it doesn't actually stand over about million other books about programming that really just scratch the surface and don't show how to write production ready code.
That's what is still missing on the market. It's almost like a well kept secret that only developers working at large corporations know.
While i have so many things i could say about your idea of what development means and the "only path" you seem to subscribe to. I think you're missing the point of the book.
> it won't teach someone trying to learn programming much and it doesn't actually stand over about million other books about programming that really just scratch the surface and don't show how to write production ready code.
It's not a book to teach people how to program, infact the author goes out of their way to mention only C and minor c++ has been used and that it may be beneficial for learning how to build out such a POC of redis, to DIY your own.
This is not a book to hand over to an outsourcing company and expect production ready work. Nor was it described as such.
It’s right there in the name TDD. You write a test and then you change the code to make the test pass. There are no secrets, that’s all there is to it. If you can do that you can do TDD. That’s why everyone else is putting it on their CV and you aren’t: you think there’s something more to it. There isn’t. It’s a very basic technique.
> how to write code so that it can be proven it works the way intended and that it can handle unhappy paths and edge cases
That is not what TDD promises. All TDD does is ensure you have some tests at all, which is a good base on which to add more tests, so that when you find edge cases you can do more TDD, and TDD is a nice way to work. Nobody in history has come up with a way to make testing comprehensive. You can try quickcheck and fuzzing to generate lots and lots of test cases, but they are still sampling from the input space rather than covering it. You can cover the input space for a single 32-bit number input, but that’s about it. The only way to do better is to formally prove your software is correct using mathematical logic. TDD does not provide this and never will. You might have seen “100% test coverage”, but that claim is close to meaningless with respect to the range of possible inputs that your code is meant to handle. All it says is that every edge case that you did think of has a test that exercises it. A function consisting of “return 0” only will have 100% test coverage with a single test in the suite. Doesn’t mean it works. See SQLite, which still finds bugs despite 100% test coverage.
What you seem to be looking for is someone to teach you how to think of edge cases. That’s a skill that can’t really be taught. Just have a go. When you find an edge case later that you didn’t think of, great, maybe you will think of similar ones next time.
A long time ago I found the chicken and egg problem to be more general than TDD. It was more along the lines of, "how do you write a program when you don't have the code (subroutines, other functions) it needs to run?" Where do you start - at main() or say, opening a file that doesn't yet exist to read/write some data? It was always a struggle.
Then I found Structure and Interpretation of Computer Programs (SICP)[0] and the video lectures from MIT[1]. I had an epiphany when Sussman talked about "wishful thinking" in video for lecture 1b[0] (around 48:00 in). The lesson was something along the lines of start naming functions that would do what you needed done, and write them later. Just pretend they exist and eventually bring them into existence. SICP has so many gems. It really made a difference for me.
If applied to TDD, write a test that won't even compile because the function under test doesn't even exist. Then iterate by writing something that compiles, but will probably fail, and then improve it until the test passes.
Edit: How to Design Programs and How to Design Worlds are two other resources I enjoyed. I don't write programs in Lisp or Scheme anymore, but the experience of just running through the exercises in these and other books was enlightening.
I remember how tdd (together with scrum) was basically a religion in 'large corporations'. How a lot of people talked about 'the right way to do things'. Your comment made me smile nostalgically...
But back to the topic. I'm pretty sure there are enough books about how to write tests. And more or less all engineers understand the value of testability and coverage (not necessary tdd!). At leas I wouldn't need a book for this. But books about building something closer to complex real world systems - that's a good stuff engineers would enjoy.
I commented this above, but Distributed Services in Go does testing and low-level kafka-like implementation. It taught me so much, I'm probably going to reread at some point and implement in Rust.
Redis rules and fixes a lot of problems, especially when you probably already have a server/cluster for something else. A building block of modern computing is a bit overzealous to me.
I did something similar by building Redis in Rust a few years ago. It’s a really good learning and exercise to a lot of topics like network programming. https://github.com/huangjiahua/reredis
Kudos to the author for a very succinct, practical book. It would be nice to see a few chapters on high availability though - seems to me that failover and consistency guarantees are one of the really hard bits.
Is anyone aware of an in-depth explanation about how key-value store persistence, such as Redis RDB, is built? A guide on compaction would be really interesting and useful.
Highly recommending Martin Kleppmanns „Designing Data-Intensive Applications“ book (O‘Reilly). Covers virtually any topic around database implementation. Well written, big source of knowledge. Introduces concepts gradually and with many examples. Lots of references to actual software out there, and how they are done.
KV persistence is covered in Chapter 3, right from the start. Redis is also mentioned as an example for an in-memory store with „weak durability by writing to disk asynchronously“.
The book seems really interesting but I am quite poor so thanks for making a free web version available.
I'll think of buying the epub when I earn a wage <3
The redis wire protocol is simple enough where it's not very difficult. It's instructive to write it in C because there's a lot of translatable knowledge to other languages: read in a buffer, cast that buffer to the protocol-specific struct, read the various fields from memory, etc. Basic operations for a systems-level project like this.
I agree it's instructive, but on the other hand, the time might be right to start teaching why not to do this stuff. Modeling a protocol parser in a high-level language that can spit up correct and highly-optimized C code would be just as instructive and perhaps even more fun.
Yes, "read in a buffer, cast that buffer to the protocol-specific struct, read the various fields from memory, etc" are all operations one should generally NOT do. This was instructive maybe three decades ago, when the internet was relatively safe.
Knowing how to do this RIGHT is important.
The place this is still helpful is low-level programming, and embedded is a far better place to learn. If one part of your microwave is talking to a sensor, security isn't really an issue, since you control both ends.
Yeah, and I said the above being myself a big fan of C. I write C in my hobby projects. Yet if I were doing anything non-trivial that talks over the network, that protocol parsing would for sure not be hand-written.
I really enjoyed writing low-level C code, maybe three decades ago. I grew out of it, eventually, but I remember the joy. Similar for assembly, actually.
(And rereading, I don't mean "growing out" in a derogatory way -- I grow into and out of a lot of things -- and low-level programming is something anyone can enjoy for a few years)
You don't have to learn raw systems coding yourself, but if nobody bothers, the code of 2050 is going to be built on a lot of stale, brittle magic that nobody can reproduce or understand.
This is exactly the sort of playground effort where those who do want to learn can get their hands dirty. If this is their first time doing this kind of project, they shouldn't use any of this code in production. There will be all sorts of cheated corners and vulnerabilities, and not just in obvious high-risk places like this. That's how hands-on learning goes.
At a cursory glance, they assert((size_t)rv <= n), but if that's the assert.h assert(), it's just void when NDEBUG is defined, and cmake seems to define that when building a Release target. Also, gcc does not seem to define it by default. So that's a bit of an overflow there.
kafka is a good fit when you design modern system. As your system grows larger, necessity of message queue becomes obvious. And kafka is probably the most widely used general-purpose high performance distributed q
Redis is great, but sometimes it isn't always the best solution.
We almost configured it, but instead implemented our own Cache web service and used the built in memory/cache management of that. Yes, it's only accessible via http but it's given us a lot of flexibility. We are primarily using it for caching of large datasets (hundreds of MBs). When service has to be restarted, it makes a call to get all the items it needs.
It all started with https://github.com/rcarmo/miniredis (which I forked to add and experiment with pub/sub), and I just found myself doing it again and again because Redis is the quintessential network service:
By implementing it, you learn about socket handling, event loop for a specific runtime, threading models, data representation, concurrency (if you want to do a multi-threaded version), etc. None of my "ports" are fully functional, but they all helped me sort out some of the above plus build tools, packaging, dependencies, etc.
It's "hello world" for core cloud native microservices, if you will (and without having to do REST or JSON stuff).