> you realize nobody understands WHY or HOW these models work under the hood rig...

zamfi · on Dec 5, 2023

> Of course we understand how they work, we built them! There is no mystery in their mechanisms, we know the number of neurons, their connectivity, everything from the weights to the activation functions. This is not a mystery, this is several decades of technical developments.

The discovery of DNA’s structure was heralded as containing the same explanatory power as you describe here.

Turns out, the story was much more complicated then, and is much more complicated now.

Anyone today who tells you they know why LLMs are capable of programming, and how they do it, is plainly lying to you.

We have built a complex system that we only understand well at a basic “well there are weights and there’s attention, I guess?” layer. Past that we only have speculation right now.

kergonath · on Dec 5, 2023

> The discovery of DNA’s structure was heralded as containing the same explanatory power as you describe here.

Not at all. It's like saying that since we can read hieroglyphics we know all about ancient Egypt. Deciphering DNA is tool to understand biology, it is not that understanding in itself.

> Turns out, the story was much more complicated then, and is much more complicated now.

We are reverse engineering biology. We are building artificial intelligence. There is a fundamental difference and equating them is fundamentally misunderstanding both of them.

> Anyone today who tells you they know why LLMs are capable of programming, and how they do it, is plainly lying to you.

How so? They can do it because we taught them, there is no magic.

> We have built a complex system that we only understand well at a basic “well there are weights and there’s attention, I guess?” layer. Past that we only have speculation right now.

Exactly in the same way that nobody understand in detail how a complex modern SoC works. Again, there is no magic.

zamfi · on Dec 5, 2023

> How so? They can do it because we taught them, there is no magic.

Yeah, no. I mean, we can’t introspect the system to see how it actually does programming at any useful level of abstraction. “Because we taught them” is about as useful a statement as “because its genetic parents were that way”.

No, of course it’s not magic. But that doesn’t mean we understand it at a useful level.

phkahler · on Dec 8, 2023

>> Exactly in the same way that nobody understand in detail how a complex modern SoC works. Again, there is no magic.

That's absolute BS. Every part of a SoC was designed by a person for a specific function. It's possible for an individual to understand - in detail - large portions of SoC circuitry. How any function of it works could be described in detail down to the transistor level by the design team if needed - without monitoring its behavior.

shwouchk · on Dec 5, 2023

Why stop at chemistry? Chemistry is fundamentally quantum electrodynamics applied to huge ensembles of particles. QED is very well understood and gives the best predictions we have to date of any scientific theory.

How come we don’t entirely understand biology then?

kergonath · on Dec 5, 2023

> Why stop at chemistry? Chemistry is fundamentally quantum electrodynamics applied to huge ensembles of particles.

Chemistry is indeed applied QED ;) (and you don't need massive numbers of particles to have very complex chemistry)

> How come we don’t entirely understand biology then?

We understand some of the basics (even QED is not reality). That understanding comes from bottom-up studies of biochemistry, but most of it comes from top-down observation of whatever there happens to be around us. The trouble is that we are using this imperfect understanding of the basics to reverse engineer an insanely complex system that involves phenomena spanning 9 orders of magnitude both in space and time.

LLMs did not spawn on their own. There is a continuous progression from the perceptron to GPT-4, each one building on the previous generation, and every step was purposeful and documented. There is no sudden jump, merely an exponential progression over decades. It's fundamentally very different from anything we can see in nature, where nothing was designed and everything appears from fundamental phenomena we don't understand.

As I said, imagining that the current state of AI is anything like biology is a profound misunderstanding of the complexity of both. We like to think we're gods, but we're really children in a sand box.

shwouchk · on Dec 5, 2023

I will ignore your patronizing remarks beyond acknowledging them here, in order to promote civil discourse.

I think you have missed my point by focusing on biology as an extremely complex field.e, it was my mistake to use it as an example in the first place. We don’t need to go that far;

sure, llms did not spawn on their own. They are a result of thousands of years of progress in countless fields of science and engineering. Like any modern invention, essentially.

Here I remember to make sure we are on the same page on what we’re discussing - as I understand, whether “prompt engineering” can be considered an engineering/science practice. Personally I haven’t considered this enough to form an opinion but your argument does not sound convincing to me;

I guess your idea of what llms represent matters here. The way I see it, in some abstract sense we are as society exploring a current peak - in compute $ or flops and performance on certain tasks - of a rather large but also narrow family of functions. By focusing our attention on functions composed of ones we understood how to effectively find parameters for, we were able to build at this point rather complicated processes for finding parameters for the compositions.

Yes, the components are understood, at various levels of rigor, but the thing produced is not yet sufficiently understood. Partly out of cost to reproduce such research, and partly due to complexity of the system, a driver for the cost.

The fact that “prompt engineering” as a practice and that companies supposedly base their business model on secret prompts is a testament, for me, to the fact they are not well understood. A well understood system you design has a well understood interface.

Now, I haven’t noticed a specific post OP was criticizing so i take it his remarks were general. He seems to thinks that some research is not worth publishing. I tend to agree that I would like research to be of high quality, but that is subjective. Is it novel? is it true?

Now, progress will be progress and im sure current architectures will change and models will get larger. And it may be that a few giants are the only one running models large enough to require prompt engineering. Or we may find a way to have those models understand us better than a human ever could. Doubtful. And post singularity anyway, by definition.

In either case yes, probably temporary profession. But in case open research will continue in those directions as well, there will be need for people to figure out ways to communicate effectively with these. You dismiss them as testers.

However, progress in science and engineering is often driven by data where theory is lacking and I’m not aware of the existence of deep theory as of yet. eg something that would predict how well a certain architecture would perform. Engineering ahead of theory, driven by $).

As in physics that we both mentioned, knowing the component part does not automatically grant you understanding of the whole. knowing everything there is to know about the relevant physical interaction, protein folding was a tough problem that AFAIR has had a lot of success with tools from the field. Square in the realm of physics even, and we can’t give good predictions without testing (computationally).

If someone tested some folding algorithm and visually inspected results, then found a trick how to consistently improve on the result in some subcase of proteins. Would that be worthy of publishing? if yes, why is this different? if not, why not?

jejeyyy77 · on Dec 5, 2023

We designed the process. We didn't design the models - the models were "designed" based on the features of a massive dataset and massive number of iterations.

Even if you understand evolution - you still don't understand how the human body or mind works. That needs to be investigated and discovered.

In the same way, you understanding how these models were trained doesn't help you understand how the models work. That needs to be investigated and discovered.