Transformer – Spreadsheet

nobodywillobsrv · 2025-02-07T07:48:37 1738914517

Genuine question: What do people feel they understand after going through this? If we wrote out every matrix multiplication for linear regression, would we say we've truly grasped it? Is the takeaway about implementation mechanics, or does it build deeper intuition?

I see the value in visualization, but what’s the real educational gain beyond that?

(I'm an ML person/mathematician, but I haven’t lectured in over a decade—maybe I just don’t get it. I tend to prefer tutorials that build connections to known ideas, so this might be a blind spot for me.)

remuskaos · 2025-02-07T07:57:55 1738915075

I think it's supposed to quite the opposite, at least it was like that for me. When gpt3 first launched, it seemed almost like magic for me, and I have some limited ml background too. Much later I saw the 3blue1brown Video about transformers, and I was almost disappointed to see the math itself is rather simple.

My main take away was that even simple basics can produce astonishing results, I this case if they're just scaled large enough. That incredibly complex and useful emergent behavior can result from what seems like conways-game-of-life like principles.

goatlover · 2025-02-07T08:00:21 1738915221

It's the large training data which contains the knowledge for that complex and useful emergent behavior. It's like if you could import vast information about the world into Conway's game of life to enable increasingly complex levels of emergence.

elcritch · 2025-02-07T09:53:25 1738922005

Oh idea of "vast patterns" in Conway's game of life seems like a great metaphor for LLM transformers. Explaining hidden markov chains with some extras or whatnot to a layman doesn't give a good mental image but is what I guess it's like.

Probability chains alone don't seem to give a good mental visualization of how such a system comes to certain "decisions" or "thought patterns". But watching the Game of Life you can see fascinating patterns which emerge and lead to interesting patterns. That's easy to extrapolate.

Maybe in the future NNs will be understable sorta like Game of Life, "oh that NN section is running pattern 27 on XY input data. That's gonna be an unstable behavioral element combined with pattern 38c over here." Not sure if that's a fascinating or dreadfully boring prospect though.

gchadwick · 2025-02-07T10:56:37 1738925797

Personally I find value in toy examples like this as it helps me check I actually understand the core architecture. You can look at a diagram or an equation explaining a transformer block but how do you make the jump from that to actually implementing it? With such examples you can check your understanding of what's actually getting computed where, what the dimensions of the vectors and matrices are etc.

Sure the vast majority of people who want to work with LLMs don't need to know any of this. But personally I enjoy understanding how things work and I like to be able to drill down into what's happening at a very basic level. Something like this is a lot easier to play around with and check your understanding vs the source code of a real transformer model. Once you feel you've mastered a toy example like this, the real code becomes easier to get to grips with.

i_am_proteus · 2025-02-07T11:04:58 1738926298

There's some pedagogical value at the "fundamentals" end: exposing how automatic differentiation turns calculus into matrix algebra, especially for CS students who are weaker in math.

For the rest of it, there's pedagogical value in giving students worksheets (I prefer ipynbs for coding, but hand calculations are good for algorithms) to follow along with the lecture because if you don't do this, in 2025, 1/4 of the class will be on their phones (and the other 3/4 won't show up to lecture).

If you can work through the papers/source code on your own and aren't afraid of the math, these aren't for you.

cbm-vic-20 · 2025-02-07T13:24:20 1738934660

> if you don't do this, in 2025, 1/4 of the class will be on their phones (and the other 3/4 won't show up to lecture).

That seems like a crazy way to burn (potentially someone else's) thousands of dollars.

next_xibalba · 2025-02-07T15:11:15 1738941075

“We choose to [implement the Transformer in a spreadsheet] in this decade and do the other things, not because they are easy, but because they are hard; because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win, and the others, too.”

And

“Some men see things as they are, and ask why. I dream of things that never were, and ask why not.”

voxleone · 2025-02-07T15:08:28 1738940908

This type of visualization is commendable in that it helps to demystify machine learning, showing that it is nothing more than a process and that there is no intelligence in the human sense involved.

xchip · 2025-02-07T10:57:20 1738925840

Agreed

rmnclmnt · 2025-02-07T06:41:37 1738910497

On the related topic to teach AI fundamentals and intuition of the inner workings, there is a wonderful material named "AI Unplugged"[1] to perform activities with pen/pencils, cards, etc. in a game manner.

I've been using this material on several occasions with various audiences not familiar with the field of AI/ML (kids and grown-ups), and each time people seem to have been enjoying it and earning a bit of understanding of the modern world they live in.

[1] http://www.aiunplugged.org

antidnan · 2025-02-07T02:55:06 1738896906

You can even build GPT-2 in Excel: https://www.youtube.com/watch?v=yjD2n_e9E3w&ab_channel=Sprea...

xnx · 2025-02-07T03:06:02 1738897562

Still a marvel. https://news.ycombinator.com/item?id=39700256

esafak · 2025-02-07T03:00:20 1738897220

Their title says Spreadsheet, and it uses Google Sheets, so I don't know why it says Excel here.

dang · 2025-02-07T04:31:09 1738902669

We've reverted it now. (Submitted title was "Transformer Implemented Using Excel")

fangpenlin · 2025-02-07T06:04:44 1738908284

So, it turns out "Spreadsheet Is All You Need"

fragmede · 2025-02-07T05:07:45 1738904865

props for using the /copy endpoint feature of Google docs. saves a lot of hassle when teaching

nyclounge · 2025-02-07T07:53:38 1738914818

One can also argue this can be done within db such as postgres have everything store procedures/functions/triggers. In some ways DBs are just Spreadsheets with really robust querying languages.

raju · 2025-02-07T11:26:36 1738927596

Similar HN thread: https://news.ycombinator.com/item?id=42967173

alok-g · 2025-02-07T20:16:54 1738959414