> how to build systems where the whole is bigger than the sum of its parts A bit...

jackdoe · 2025-06-07T11:54:03 1749297243

> And, even the dumber LLMs would slot in naturally into such a process

That is what I am struggling with, it is really easy at the moment to slot LLM and make everything worse. Mainly because its output is coming from torch.multinomial with all kinds of speculative decoding and quantizations and etc.

But I am convinced it is possible, just not the way I am doing it right now, thats why I am spending most of my time studying.

dmos62 · 2025-06-07T14:48:36 1749307716

What's your approach?

jackdoe · 2025-06-07T15:02:11 1749308531

For studying? Mainly watching and re-watching Karpathy's 'Zero To Hero'[1] and Stanford's 'Introduction to Convolutional Neural Networks for Visual Recognition'[2], also a lot of transformers from scratch videos like Umar Jamali's videos[3], and I also study backwards to McCulloch and Pitts. Reading the 30 papers https://punkx.org/jackdoe/30.html and so on.

And of course Yannic Kilcher[4], and also listening in on the paper discussions they do on discord.

Practicing a lot with just doing backpropagation by hand and making toy models by hand to get intuition for the signal flow, and building all kinds of smallish systems, e.g. how far can you push whisper, small qwen3, and kokoro to control your computer with voice?

People think that deepseek/mistral/meta etc are democratizing AI, but its actually Karpathy who teaches us :) so we can understand them and make our own.

[1] https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxb...

[2] https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PL3FW7Lu3i5...

[3] https://www.youtube.com/@umarjamilai

[4] https://www.youtube.com/@YannicKilcher

naasking · 2025-06-07T18:45:56 1749321956

I think you'll need something like Meta's Large Concept Models to get past the language and token barrier.

jackdoe · 2025-06-07T19:10:17 1749323417

I think you are right, even if I beleve next token prediction can work, I dont think it can happen in this autoregressive way where we fully collapse the token to feed it back in. Can you imagine how much is lost from each torch.multinomial?

Maybe the way forward is in LCM or go JEPA, therwise, as this Apple paper suggests, we will just keep pushing the "pattern matching" further, maybe we get some sort of phase transition at some point or maybe we have to switch architecture, we will see. It could be that things change when we get physical multimodality and real world experience, I dont know.

RadEng00 · 2025-06-08T17:50:19 1749405019

We have to ditch language processing. And we will with online energy based models that machines boot from.

Maxwell could not get the theory of electromagnetism to work until he ditched pulleys and levers he’d included to describe the mechanics.

We won’t get AGI until we realize “there is no spoon” and language has nothing to do with our intelligence, just with out social tribalism: https://www.scientificamerican.com/article/you-dont-need-wor...

Take language out of the equation and drawing a circle, triangles, letters is just statistical physics. We can capture in energy models stored in an online state, statistical physics relative to the machine; its electromagnetic geometry: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...

Our language doesn’t exist without humans. It’s not an immutable property of physics. It’s obfuscation and mind viruses. It’s story mode.

The computer acting as a web server or an LLM has an inherent energy model to it. New models of those patterns will be refined to a statefulness that strips away unnecessary language constructs in the system; like a lot of software most don’t use just developers.

I look forward to continuing my work in the hardware world to further compress and reduce the useless state of past systems of though we copy paste around to serve developers, to reduce context to sort through, and improve model quality: https://arxiv.org/abs/2309.10668

Single function factory hardware with embedded “prompt” that will boot from a model and the machines state will scaffold itself from there are coming: https://creativestrategies.com/jensen-were-with-you-but-were...

Footprint0521 · 2025-06-10T00:27:23 1749515243

It would help optimization so much— but it if I learned one thing from ai2027.com it’s that the second we can’t understand them we’re pretty screwed