We've already built things in computing that we don't easily understand, even outside of AI, like large distributed systems and all sorts of balls of mud.
Within the sphere of AI, we have built machines which can play strategy games like chess, and surprise us with an unforseen defeat. It's not necessarily easy to see how that emerged from the individual rules.
Even a compiler can surprise you. You code up some optimizations, which are logically separate, but then a combination of them does something startling.
Basically, in mathematics, you cannot grasp all the details of a vast space just from knowing the axioms which generate it and a few things which follow from them. Elementary school children know what is a prime number, yet those things occupy mathematicians who find new surprises in that space.
Right, but this is somewhat different, in that we apply a simple learning method to a big dataset, and the resulting big matrix of numbers suddenly can answer question and write anything - prose, poetry, code - better than most humans - and we don't know how it does it. What we do know[0] is, there's a structure there - structure reflecting a kind of understanding of languages and the world. I don't think we've ever created anything this complex before, completely on our own.
Of course, learning method being conceptually simple, all that structure must come from the data. Which is also profound, because that structure is a first fully general world/conceptual model that we can actually inspect and study up close - the other one being animal and human brains, which are much harder to figure out.
> Basically, in mathematics, you cannot grasp all the details of a vast space just from knowing the axioms which generate it and a few things which follow from them. Elementary school children know what is a prime number, yet those things occupy mathematicians who find new surprises in that space.
Prime numbers and fractals and other mathematical objects have plenty of fascinating mysteries and complex structures forming though them, but so far none of those can casually pass Turing test and do half of my job for me, and millions other people.
--
[0] - Even as many people still deny this, and talk about LLMs as mere "stochastic parrots" and "next token predictors" that couldn't possibly learn anything at all.
We know quite well how it does it. It's applying extrapolation to its lossily compressed representation. It's not magic and especially the HN crowd of technical profficient folks should stop treating it as such.
That is not a useful explanation. "Applying extrapolation to its lossily compressed representation" is pretty much the definition of understanding something. The details and interpretation of the representation are what is interesting and unknown.
We can use data based on analyzing the frequency of ngrams in a text to generate sentences, and some of them will be pretty good, and fool a few people into believing that there is some solid language processing going on.
LLM AI is different in that it does produce helpful results, not only entertaining prose.
It is practical for users to day to replace most uses of web search with a query to a LLM.
The way the token prediction operates, it uncovers facts, and renders them into grammatically correct language.
Which is amazing given that, when the thing is generating a response that will be, say, 500 tokens long, when it has produced 200 of them, it has no idea what the remaining 300 will be. Yet it has committed to the 200; and often the whole thing will make sense when the remaining 300 arrive.
The research posted demonstrates the opposite of that within the scope of sequence lengths they studied. The model has future tokens strongly represented well in advance.
Within the sphere of AI, we have built machines which can play strategy games like chess, and surprise us with an unforseen defeat. It's not necessarily easy to see how that emerged from the individual rules.
Even a compiler can surprise you. You code up some optimizations, which are logically separate, but then a combination of them does something startling.
Basically, in mathematics, you cannot grasp all the details of a vast space just from knowing the axioms which generate it and a few things which follow from them. Elementary school children know what is a prime number, yet those things occupy mathematicians who find new surprises in that space.