> It would take a paltry amount of money, in the low millions of dollars, to des...

zackmorris · on Jan 2, 2021

Ya I hear you, I started with C++ and dabbled in 68000 assembly back in the early 90s. The biggest problems back then was the main loop and that a program could only get to a million lines before it was too unstable and crashed. Even just the protected memory of Java, or protected memory in general, felt like a distant dream.

Now we have event-based apps but we're also stuck in the purgatory of async hell. Hundreds of classes got replaced with hundreds of factories and having to endlessly learn DSLs. Compilers mutated into handwritten unit tests. I feel a heaviness in my chest just writing this, because one thought leads to another and it's hard to articulate the root of the discontent. I still believe that the web way is better than bare metal, but I'm saddened to see it reinventing the bad habits that we abandoned 25 years ago.

For me, what's really going on is that computers have not gotten appreciably faster since about the time that PowerPC iMacs were running OS X, roughly the year 2000. Before that, computers were getting 100 times faster every decade, and then it just kinda.. stopped. Only video games kept going, the only tradeoff being that we have to use someone else's 3D library rather than just writing ray tracers in a few pages of code (if only we had general-purpose multicore chips).

And that made programmers desperate, because they were still focusing on performance instead of stepping back and seeing the high-level abstractions that were largely understood by the 1980s. Everyone is so used to being compute-bound that we can't even think about solutions outside of that reality.

My "idea" to fix all this, bluntly, is to forget about improving single-threaded performance and start giving people the raw computing power they need to get back to work again. To keep up with Moore's Law, that's a computer with 10,000 parallel threads running at least 1 GHz, for $1000. Or 100 times the cores every decade. My initial mention of 1000 cores was perhaps conservative.

Pretty much all of the problems we deal with today are embarrassingly parallel. A synchronous blocking PHP page is, when served to thousands of users. DSP is. Neural nets, genetic algorithms, stocks, Bitcoin..

So our desktop machines should really be thousands of Docker containers with a total capacity of like 1000000%. No program would ever block another program, or get out of its sandbox. Programs would sometimes run across the internet. I picture it kind of like this big Minecraft Disneyland where you're in VR but processing stuff in the background and forgetting it's there. Maybe you'd devote 50% to an AI agent like J.A.R.V.I.S. that sits around all day backing up its best self and evolving its subprocesses to be even better. Not being compute-bound is like being able to throw processing power at problems declaratively and never having to solve anything menial again. I've been daydreaming about all this since like 1999 hahaha.

The math is all there, I've written about it at length in previous comments. You basically take an old processor like MIPS that was about as optimized for single-threaded performance as one can get, without getting mired in the evolutionary dead end of long pipelines and huge caches. That or the PowerPC 601 or DEC Alpha had on the order of 1-3 million transistors. Looks like the Apple M1 has 16 billion. So the raw numbers do suggest 10,000 of last-century's best cores. Then spend another 10 billion transistors for about 1-10 GB of RAM, or 1 MB of ram in-core.

https://en.wikipedia.org/wiki/Transistor_count

Yes, memory routing is a pain, but you just use content-addressable memory and treat the interconnect just like any other network on the internet. The cores use caching, compression and copy-on-write to combine the best aspects of Erlang, Lisp, and Clojure. We'd write code in a Javascript-like language that's identical to the ideas from Immer and Immutable.js, but natively. Since everything is read-only, it statically compiles ahead to the best of its ability, running through a monad only when processing mutable state, and then going back to static. When you're not compute-bound, the static stuff processes instantly. It basically inverts the logic, so the only slow part of your code is the IO. My terminology isn't exact here, but it would basically transpile a subset of Javascript to Lisp and then run the embarrassingly parallel stuff 10,000 times faster than we're used to, rivaling the speed of video cards.

To do this as a hobby project, it might be fun to see how many early ARM cores and small RAMs could fit on a chip with a billion transistors. Then see how hard it is to add the content-addressable networking to the OS. Then finally get 1000 Docker containers running with, say, Debian. I used to daydream about doing it on an FPGA, but haven't kept up as closely as I'd like. Also I feel like there is industry pressure to keep FPGAs down, because they haven't kept up with Moore's law either, and never went fully open source like they should have.

I feel kind of weary about all this because I've been thinking about it for so long, and have a lot of regrets about not using my degree more. I'm still just writing CRUD apps like everyone else. It's so tedious, and takes so much code to do so little visible work, that I've lost seasons.