Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It would take a paltry amount of money, in the low millions of dollars, to design real (general purpose) multicore processors that deliver orders of magnitude better performance than what we have today. But unfortunately lawmakers are for the most part technologically illiterate, and technologists are unduly skeptical of any type of programming outside the mainstream. > > So my dream of a sub-$1000, 1000+ core CPU with a modest amount of RAM per core (between 1 MB and 1 GB) that can be programmed with existing tools like Erlang/Go/MATLAB/Julia and even Docker is just never gonna happen. And without that, there is no viable road to really experiment with stuff like AI, physics simulations etc without renting time in the cloud. We have the impression that progress is being made on these endeavors today, but things look a little different to me, watching them play out at a glacial pace, at mind boggling expense, over 3-4 decades. I mourn what might have been.

I'm really curious what specifically you mean by this. I see similar issues on the software side (my degree is in CS). Software is incredibly bloated and horrible at interoperability. Unix had pipes back in the early 70s, and somehow with GUIs and then mobile "apps," we've regressed. Identity-based security has failed time and again. And only rarely has software design progressed meaningfully beyond structured programming from the late 70s (not to mention the languages). Moore's law has given software developers around over 100M x improvement over the past 40 years, yet ordinary people would scarcely notice.

TL;DR: I see a lot of the same flaws in software, dominated by fads and popularity only of what is mainstream.

I'd really like to hear a dissident point of view from the hardware side.




Ya I hear you, I started with C++ and dabbled in 68000 assembly back in the early 90s. The biggest problems back then was the main loop and that a program could only get to a million lines before it was too unstable and crashed. Even just the protected memory of Java, or protected memory in general, felt like a distant dream.

Now we have event-based apps but we're also stuck in the purgatory of async hell. Hundreds of classes got replaced with hundreds of factories and having to endlessly learn DSLs. Compilers mutated into handwritten unit tests. I feel a heaviness in my chest just writing this, because one thought leads to another and it's hard to articulate the root of the discontent. I still believe that the web way is better than bare metal, but I'm saddened to see it reinventing the bad habits that we abandoned 25 years ago.

For me, what's really going on is that computers have not gotten appreciably faster since about the time that PowerPC iMacs were running OS X, roughly the year 2000. Before that, computers were getting 100 times faster every decade, and then it just kinda.. stopped. Only video games kept going, the only tradeoff being that we have to use someone else's 3D library rather than just writing ray tracers in a few pages of code (if only we had general-purpose multicore chips).

And that made programmers desperate, because they were still focusing on performance instead of stepping back and seeing the high-level abstractions that were largely understood by the 1980s. Everyone is so used to being compute-bound that we can't even think about solutions outside of that reality.

My "idea" to fix all this, bluntly, is to forget about improving single-threaded performance and start giving people the raw computing power they need to get back to work again. To keep up with Moore's Law, that's a computer with 10,000 parallel threads running at least 1 GHz, for $1000. Or 100 times the cores every decade. My initial mention of 1000 cores was perhaps conservative.

Pretty much all of the problems we deal with today are embarrassingly parallel. A synchronous blocking PHP page is, when served to thousands of users. DSP is. Neural nets, genetic algorithms, stocks, Bitcoin..

So our desktop machines should really be thousands of Docker containers with a total capacity of like 1000000%. No program would ever block another program, or get out of its sandbox. Programs would sometimes run across the internet. I picture it kind of like this big Minecraft Disneyland where you're in VR but processing stuff in the background and forgetting it's there. Maybe you'd devote 50% to an AI agent like J.A.R.V.I.S. that sits around all day backing up its best self and evolving its subprocesses to be even better. Not being compute-bound is like being able to throw processing power at problems declaratively and never having to solve anything menial again. I've been daydreaming about all this since like 1999 hahaha.

The math is all there, I've written about it at length in previous comments. You basically take an old processor like MIPS that was about as optimized for single-threaded performance as one can get, without getting mired in the evolutionary dead end of long pipelines and huge caches. That or the PowerPC 601 or DEC Alpha had on the order of 1-3 million transistors. Looks like the Apple M1 has 16 billion. So the raw numbers do suggest 10,000 of last-century's best cores. Then spend another 10 billion transistors for about 1-10 GB of RAM, or 1 MB of ram in-core.

https://en.wikipedia.org/wiki/Transistor_count

Yes, memory routing is a pain, but you just use content-addressable memory and treat the interconnect just like any other network on the internet. The cores use caching, compression and copy-on-write to combine the best aspects of Erlang, Lisp, and Clojure. We'd write code in a Javascript-like language that's identical to the ideas from Immer and Immutable.js, but natively. Since everything is read-only, it statically compiles ahead to the best of its ability, running through a monad only when processing mutable state, and then going back to static. When you're not compute-bound, the static stuff processes instantly. It basically inverts the logic, so the only slow part of your code is the IO. My terminology isn't exact here, but it would basically transpile a subset of Javascript to Lisp and then run the embarrassingly parallel stuff 10,000 times faster than we're used to, rivaling the speed of video cards.

To do this as a hobby project, it might be fun to see how many early ARM cores and small RAMs could fit on a chip with a billion transistors. Then see how hard it is to add the content-addressable networking to the OS. Then finally get 1000 Docker containers running with, say, Debian. I used to daydream about doing it on an FPGA, but haven't kept up as closely as I'd like. Also I feel like there is industry pressure to keep FPGAs down, because they haven't kept up with Moore's law either, and never went fully open source like they should have.

I feel kind of weary about all this because I've been thinking about it for so long, and have a lot of regrets about not using my degree more. I'm still just writing CRUD apps like everyone else. It's so tedious, and takes so much code to do so little visible work, that I've lost seasons.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: