Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Transformer – Spreadsheet (byhand.ai)
250 points by next_xibalba 10 months ago | hide | past | favorite | 20 comments


Genuine question: What do people feel they understand after going through this? If we wrote out every matrix multiplication for linear regression, would we say we've truly grasped it? Is the takeaway about implementation mechanics, or does it build deeper intuition?

I see the value in visualization, but what’s the real educational gain beyond that?

(I'm an ML person/mathematician, but I haven’t lectured in over a decade—maybe I just don’t get it. I tend to prefer tutorials that build connections to known ideas, so this might be a blind spot for me.)


I think it's supposed to quite the opposite, at least it was like that for me. When gpt3 first launched, it seemed almost like magic for me, and I have some limited ml background too. Much later I saw the 3blue1brown Video about transformers, and I was almost disappointed to see the math itself is rather simple.

My main take away was that even simple basics can produce astonishing results, I this case if they're just scaled large enough. That incredibly complex and useful emergent behavior can result from what seems like conways-game-of-life like principles.


It's the large training data which contains the knowledge for that complex and useful emergent behavior. It's like if you could import vast information about the world into Conway's game of life to enable increasingly complex levels of emergence.


Oh idea of "vast patterns" in Conway's game of life seems like a great metaphor for LLM transformers. Explaining hidden markov chains with some extras or whatnot to a layman doesn't give a good mental image but is what I guess it's like.

Probability chains alone don't seem to give a good mental visualization of how such a system comes to certain "decisions" or "thought patterns". But watching the Game of Life you can see fascinating patterns which emerge and lead to interesting patterns. That's easy to extrapolate.

Maybe in the future NNs will be understable sorta like Game of Life, "oh that NN section is running pattern 27 on XY input data. That's gonna be an unstable behavioral element combined with pattern 38c over here." Not sure if that's a fascinating or dreadfully boring prospect though.


Personally I find value in toy examples like this as it helps me check I actually understand the core architecture. You can look at a diagram or an equation explaining a transformer block but how do you make the jump from that to actually implementing it? With such examples you can check your understanding of what's actually getting computed where, what the dimensions of the vectors and matrices are etc.

Sure the vast majority of people who want to work with LLMs don't need to know any of this. But personally I enjoy understanding how things work and I like to be able to drill down into what's happening at a very basic level. Something like this is a lot easier to play around with and check your understanding vs the source code of a real transformer model. Once you feel you've mastered a toy example like this, the real code becomes easier to get to grips with.


There's some pedagogical value at the "fundamentals" end: exposing how automatic differentiation turns calculus into matrix algebra, especially for CS students who are weaker in math.

For the rest of it, there's pedagogical value in giving students worksheets (I prefer ipynbs for coding, but hand calculations are good for algorithms) to follow along with the lecture because if you don't do this, in 2025, 1/4 of the class will be on their phones (and the other 3/4 won't show up to lecture).

If you can work through the papers/source code on your own and aren't afraid of the math, these aren't for you.


> if you don't do this, in 2025, 1/4 of the class will be on their phones (and the other 3/4 won't show up to lecture).

That seems like a crazy way to burn (potentially someone else's) thousands of dollars.


“We choose to [implement the Transformer in a spreadsheet] in this decade and do the other things, not because they are easy, but because they are hard; because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win, and the others, too.”

And

“Some men see things as they are, and ask why. I dream of things that never were, and ask why not.”


This type of visualization is commendable in that it helps to demystify machine learning, showing that it is nothing more than a process and that there is no intelligence in the human sense involved.


Agreed


On the related topic to teach AI fundamentals and intuition of the inner workings, there is a wonderful material named "AI Unplugged"[1] to perform activities with pen/pencils, cards, etc. in a game manner.

I've been using this material on several occasions with various audiences not familiar with the field of AI/ML (kids and grown-ups), and each time people seem to have been enjoying it and earning a bit of understanding of the modern world they live in.

[1] http://www.aiunplugged.org




Their title says Spreadsheet, and it uses Google Sheets, so I don't know why it says Excel here.


We've reverted it now. (Submitted title was "Transformer Implemented Using Excel")


So, it turns out "Spreadsheet Is All You Need"


props for using the /copy endpoint feature of Google docs. saves a lot of hassle when teaching


One can also argue this can be done within db such as postgres have everything store procedures/functions/triggers. In some ways DBs are just Spreadsheets with really robust querying languages.



See also:

(Classical) Computer vision basics in Excel, using just formulas: https://github.com/amzn/computer-vision-basics-in-microsoft-...

Original HN Submission: https://news.ycombinator.com/item?id=22357374

PS: I am the primary author.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: