By keeping the code as visible (read, small) as possible, I see more code and can better reason at a macro level. To scale this down into the micro level of dealing with individual compiler passes, I replace all the traditional programming paradigms with others in a sort of 1 for 1 exchange. In this way, I develop a new set of idiomatic programming methods that are so concise, they can begin to be read as we read and chunk English phrases. By doing so, it becomes actually easier to just write out most algorithms, because the normal name for such an algorithm is basically as long as the algorithm itself written out. This means that I start to learn to chunk idioms as phrases and can read code directly, without the cost of name lookup indirection. I can get away with this because I've made reusability and abstraction less important (vastly so) because I can literally see every use case of every idiom on the screen at the same time. It literally would take more time to write the reusable abstraction than it would to just replace the idiomatic code in every place. It's a case of the disposability of code reaching a point that reusability is much less valuable.
This means that in those cases where reuse is valuable, it's very valuable, and it comes to the fore and you can see it as the critical thing that it is. It doesn't get drowned in otherwise petty abstractions that assist reusability, since we don't need that anymore.
Furthermore, if I write my code correctly, there is very, very little boiler plate in the compiler. Almost none. This means that every line is significant. By doing this it means that you don't get the fun of feeling like you're accomplishing something by typing in lots of excess boiler plate, but it does mean that you have no wasted architecture. Because rewriting the architecture is so trivial, basically everything now becomes important, and you don't have petty book keeping code around. You know that everything is important, and there is no superfluous bits.
The result, as mentioned elsewhere, is code that is getting continuously simpler, rather than continuously more complex. The code is getting easier to change over time, not harder. The architecture is getting simpler and more direct and easier to explain. Because it costs so little to re-engineer the compiler, I can do so constantly, resulting in little to no technical debt.
This is an intentional synergistic choice of a host of programming techniques, styles, disciplines, and design choices that enables me to program this way. Give up one of them and you start to break things down. It allows for a highly optimized programming code base that has all of the desirable properties people wish their code bases have, and it scares people. I think that's a good thing. Because I don't want people to see this codebase as just another thing. I want them to see that this is something truly different. How can I get away with no module system? How can I get away with no hierarchy? How can I get away with having everything at the top-level, with almost no nested definitions? How can I get away with writing a compiler that is not only shorter, but fundamentally simpler from a PL standpoint than standard compilers of similar complexity by using only function composition and name binding? How can I get a code base that has more features but continues to shrink?
By chasing smaller code. :-)
I assure you, and I'll make good on this in another reply here, I could get you up and running on understanding the code and how it works faster than just about any other compiler project out there. In the end, one of the goals I want for this compiler is for people to say, "Woah, wait, that's it? That's trivially simple." The more I can push people to think of my compiler as so trivial as to be obvious, the more I win. The compiler really is so dirt simple as to shock any normal compiler writer.
But to make it that simple, I have to do things in ways that people don't expect, because people expect complexity and indirection, they expect unnecessary layers for "safety" and they expect code that needs built in protections because the code is too complex to be obviously correct.
I'm pushing the other direction. If you can see your entire compiler at one go on a standard computer screen, what sort of possibilities does that open up? You can start thinking at the macro level, and simply avoid a whole host of problems because they are obviously wrong at that level. When you aren't afraid to delete you entire compiler and start from scratch? What sort of possibilities does that open up to you?
First, please let me apologize for my ill-considered and rude comment... cringe.
Thank you for explaining. Wow, so much to chew on here. The naming conventions and trains sound really interesting. I can see how having a lot of the code visible on one screen would be a fantastic advantage. Again thanks for writing this up. Obviously I didn't find your code transparent at first glance, but clearly if one takes the time to understand what you are doing, the approach has its benefits. I look forward to reading more of what you post. And you've got me intrigued about APL.
Your comments reminded me of this anecdote about Arthur Whitney:
"The k binary weighs in at about 50Kb. Someone asked about the interpreter source code. A frown flickered across the face of our visitor from Microsoft: what could be interesting about that? “The source is currently 264 lines of C,” said Arthur. I thought I heard a sotto voce “that’s not possible.” Arthur showed us how he had arranged his source code in five files so that he could edit any one of them without scrolling. “Hate scrolling,” he mumbled."
It does. Furthermore, he's "simplified" APL in K to require less infrastructure, with fewer primitives, and the like. Combined with some clever, and some would argue, devious programming practices, he's able to keep things pretty small. I don't know if the interpreter is still that small, though. If someone reminds me, maybe I can talk about scrolling. :-)
Since I believe Whitney wrote the J incunabulum, I suspect that it looks very similar. The code is actually quite simple and straightforward if you take the time to read it.
Could you write a blog post (probably needs several) about the code style, architecture and design of your compiler and the idioms that you talk about ? I love the idea about keeping a project code base so small leveraging concise idioms so that everything fits in a meat-bag head, but have no idea how one goes about achieving that in practice. (Learning APL to get some pearls of wisdom would be fine)
It's something I've been working on for a while, but because the architecture is under constant flex, it's actually more valuable to be able to know how to "experience" or discover the architecture in the compiler code itself than to have a separate document to follow, since it's very easy for that document to get out of date quickly. I am building up a set of documents that discuss some of the core idioms and ideas though, and I hope to have something come of this live session that I can maybe put into an interactive document that people can work with.
1. What happens if you get sick. You say this is a project in production and there is money on the table (I assume not only yours). What if you get sick and are unable to work for 3 weeks or 6 months. Don't you think that this code is very hard to grasp to someone else, who would have to temporarily work on your postion?
2. It is weird, that you wrote such a long essay, spanning two comments, but it has so little examples from the actual code. Usually when people explain stuff they go between the abstract concepts and how they are materialized in the code. Here you only explain the idea behind writing it and how it makes you feel/operate/gain flexibility and performance but the closest to the code information I've got from it is that it has compiler passes and that it has a C++ runtime in a string variable. Just a thought, what do you think about that?
At this point, if I get sick, the code doesn't move much. If I were permanently disabled, this someone else could take over. I have people contribute bugs, tests, and other things fairly often. If you had to temporarily work on the code base and weren't familiar with the background of the project, I would say you'd be lost. It's just not the sort of thing that you can start tweaking things here and there so easily, because almost everything that needs changing is a matter of addressing architectural or serious questions that require you to really understand the project. Because of the way the code is written, there's basically no "code monkey" type work. That means that you only do meaningful work, but it also means that only people who are knowledgeable architects can work on the code. You can imagine the same thing in other code bases. Imagine that you didn't need any of your lower-level programmers anymore for work because there was nothing for them to do. Now imagine how the bus factor changes on the code when only your chief architects are necessary for working on that code base. That's very nice in one dimension, but it does create quite a different picture.
You're right about the code examples. I figure that people were already posting some code snippets. I wanted to give the big ideas rather than any specifics. The reason for this is basically that if you take any single line of code out of context, it's a bit hard to explain why I'm doing the things that I'm doing. It's very much a macro design, which is why I am offering the live session to go through. It's sort of, but not quite, an "all or nothing" thing. if you let me sit down with you and go through the entire code base, then I can explain how it all fits together and why things are the way they are, but if you just take a single piece of code out, you're missing the picture.
If I took a single compiler pass, out, for instance, you'd have between 1 and 12 lines of code to look at. I could explain a few features, but how would I explain that when you look at this piece of code you're able to see it entirely in context? Well, I can't, because the code it completely out of context at that point. Or what about demonstrating how the naming conventions exhibit structure informative regularity? Again, I can't, because that's a visual design element of the code. It's something you have to "see" by looking at the whole painting as it were.
The naming convention is actually a great example. Out of context, there's apparently no rhyme or reason to it. But in context, it forms a key component to the visual regularity and continuity throughout the code. The names are an important part of how you can see the structure of the code. It helps to orient you in the big pie. But if I were to quote a single line here, there's now pie to look at, no sky to navigate by. It's just a single constellation. By analogy, it does less good to say, here's the Big Dipper, it's useful. But why? Because it's easy to find amidst the context of starts and its shape helps you to find the North Star. But on its own it doesn't seem as valuable. At that point it is just another constellation. The same thing happens with this code.
So I'll go through and explicate it all in detail in the live session, where I can provide the "painting" and workflow in its entire so people can see how it works. Then you can see how my comments here match up with the code.
Something that might be worthwhile to consider is the fact that someone who wants to make a change, only needs to look at a small program instead of a large program.
In the large program case, the programmer feels like they can cross-cut it, install some duplication, and yes: get their change done faster, but at a cost of making the program bigger.
But in the small-program case, you only pay the cost of learning the codebase when you add a new programmer to it -- something that happens very infrequently. Your program stays small, and you gain all the benefits therein (faster, fewer bugs, and so on).
This is really admirable stuff and I share this kind of goal even though I'm not working in APL style at this time, though I understand the appeal of shifting in that direction as more of the code gets abstract - and it necessarily should be so abstract if you're trying to maximize the simplicity. I believe most codebases suffer from prematurely abstracting with the easy stuff built in the source language(classes, generics, etc), and then not having the abstraction they really need when it's necessary, and being too tangled up to build it.
The only problem is that I don't know where to start if I wanted to study what you're doing and take notes. Those millions of lines of changes are still lurking in the background as building blocks for an overall understanding.
Some of that deals with the micro and some with the macro level ideas, but there are some key elements in those that will be necessary to appreciate the whole thing.
This means that in those cases where reuse is valuable, it's very valuable, and it comes to the fore and you can see it as the critical thing that it is. It doesn't get drowned in otherwise petty abstractions that assist reusability, since we don't need that anymore.
Furthermore, if I write my code correctly, there is very, very little boiler plate in the compiler. Almost none. This means that every line is significant. By doing this it means that you don't get the fun of feeling like you're accomplishing something by typing in lots of excess boiler plate, but it does mean that you have no wasted architecture. Because rewriting the architecture is so trivial, basically everything now becomes important, and you don't have petty book keeping code around. You know that everything is important, and there is no superfluous bits.
The result, as mentioned elsewhere, is code that is getting continuously simpler, rather than continuously more complex. The code is getting easier to change over time, not harder. The architecture is getting simpler and more direct and easier to explain. Because it costs so little to re-engineer the compiler, I can do so constantly, resulting in little to no technical debt.
This is an intentional synergistic choice of a host of programming techniques, styles, disciplines, and design choices that enables me to program this way. Give up one of them and you start to break things down. It allows for a highly optimized programming code base that has all of the desirable properties people wish their code bases have, and it scares people. I think that's a good thing. Because I don't want people to see this codebase as just another thing. I want them to see that this is something truly different. How can I get away with no module system? How can I get away with no hierarchy? How can I get away with having everything at the top-level, with almost no nested definitions? How can I get away with writing a compiler that is not only shorter, but fundamentally simpler from a PL standpoint than standard compilers of similar complexity by using only function composition and name binding? How can I get a code base that has more features but continues to shrink?
By chasing smaller code. :-)
I assure you, and I'll make good on this in another reply here, I could get you up and running on understanding the code and how it works faster than just about any other compiler project out there. In the end, one of the goals I want for this compiler is for people to say, "Woah, wait, that's it? That's trivially simple." The more I can push people to think of my compiler as so trivial as to be obvious, the more I win. The compiler really is so dirt simple as to shock any normal compiler writer.
But to make it that simple, I have to do things in ways that people don't expect, because people expect complexity and indirection, they expect unnecessary layers for "safety" and they expect code that needs built in protections because the code is too complex to be obviously correct.
I'm pushing the other direction. If you can see your entire compiler at one go on a standard computer screen, what sort of possibilities does that open up? You can start thinking at the macro level, and simply avoid a whole host of problems because they are obviously wrong at that level. When you aren't afraid to delete you entire compiler and start from scratch? What sort of possibilities does that open up to you?