> With minimal guidance[, LLM-based systems] put out pretty sensible tests.
Yes and no. They get out all the initial annoying boilerplate of writing tests out of the way, and the tests end up being mostly decent on the surface, but I have to manually tweak the behavior and write most of the important parts myself, especially for non-trivial tricky scenarios.
However, I am not saying this as a point against LLMs. The fact that they are able to get a good chunk of the boring boilerplate parts of writing unit tests out of the way and let me focus on the actual logic of individual tests has been noticeably helpful to me, personally.
I only use LLMs for the very first initial phase of writing unit tests, with most of the work still being done by me. But that initial phase is the most annoying and boring part of the process for me. So even if I still spend 90% of the time writing code manually, I still am very glad for being able to get that initial boring part out of the way quickly, without wasting my mental effort cycles on it.
The fact that you think "change detection" tests offer zero value speaks volumes. Those may well be the most important use of unit tests. Getting the function correct in the first place isn't that hard for a senior developer, which is often why it's tempting to skip unit tests. But then you go refactor something and oops you broke it without realizing it, some boring obvious edge case, or the like.
These tests are also very time consuming to write, with lots of boilerplate that AI is very good at writing.
>The fact that you think "change detection" tests offer zero value speaks volumes.
But code should change. What shouldn't change, if business rules don't change, is APIs and contracts. And for that we have integration tests and end to end tests.
I'm sure most readers here are using an adblocker.
Try disabling it for this website. It's incredible. The content is difficult to see between all the various ad surfaces. My browser came to a screeching halt.
No. Both of the requirements "to interact" and "based on what it looks like" require unshakable foundations in reality - which current models clearly do not have.
They will inevitably hallucinate interactions and observations and therefore decrease reliability. Worse, they will inject a pervasive sense of doubt into the reliability of any tests they interact with.
Yes, you are correct that it entirely lays in the reputation of the AI.
This discussion leads to interesting question, which is "what is quality?"
Quality is determined by perception. If we can agree that an AI is acting like a user and it can use your website, we can assume that a user can use your website and therefor it is "quality".
For more, read "Zen and the Art of Motorcycle Maintenance"
This fundamental issue seems to be totally lost on the LLM-heads.
I do not want additional uncertainty deep in the development cycle.
I can tolerate the uncertainty while I'm writing. That's where there is a good fit for these fuzzy LLMs. Anything past the cutting room floor and you are injecting uncertainty where it isn't tolerable.
I definitely do not want additional uncertainty in production. That's where the "large action model" and "computer use" and "autonomous agent" cases totally fall apart.
It's a mindless extension something like: "this product good for writing... let's let it write to prod!"
Same goes with the real people, we all can do mistakes and AI Agents would get better over time, and will be ahead of many specialist pretty soon, but probably not perfect before AGI, just as we are.
Ideally it does. Users, super users, admins, etc. Though one might point out exactly how much effort we put into locking down what they can do. I think one might be able to expand this to build up a persona for how LLMs should interface with software in production, but too many applications give them about the same level of access as a developer coding straight into production. Then again, how many company leaders would approve of that as well if they thought it would get things done faster and at lower cost?
Simple questions like this are not welcomed by LLM hype sellers.
The word "reasoning" is being used heavily in this announcement, but with an intentional corruption of the normal meaning.
The models are amazing but they are fundamentally not "reasoning" in a way we'd expect a normal human to.
This is not a "distinction without a difference". You still CANNOT rely on the outputs of these models in the same way you can rely on the outputs of simple reasoning.
Completely false equivalence. The entire foundations of modern Psychology are wobbling. In order for the same to be true for the hard sciences we would need to be failing to replicate experiments which hinge on germ theory, atomic theory, the standard model, etc.
Nothing like that is happening. This false equivalence originates from several types of people:
1. Journalists that want/need to foment the largest possible catastrophe.
2. Political pundits which want/need to discredit some field.
To be fair all of that was up in question a century ago in the hard sciences. People used to believe in the plum pudding model, or doubt what component in the cell contained genetic information. The only thing that changed that was incremental experimental evidence. Social sciences are evidently going through a similar transition but we shouldn’t use that to discredit the field alone, work is still able to stand for itself if it is done well.
The level of ignorance of this basic, basic historical fact is just completely astounding on this website sometimes.
It was called natural philosophy for Christ's sake.
Modern medicine's forefather was heroic medicine, based on modulating the 4 humors (blood, phlegm, yellow bile, black bile).
We didn't zap into existence with the hard sciences being hard. We made them that way through centuries of intellectual effort, almost all of which turned out to be wrong!
> The entire foundations of modern Psychology are wobbling.
What do you mean by this? the field of psychology is perfectly capable of policing itself, and it's rejected much of the conclusions of its historical predecessors.
> In order for the same to be true for the hard sciences we would need to be failing to replicate experiments which hinge on germ theory, atomic theory, the standard model, etc.
"hard sciences" also fail to produce results relevant to most people. Sure, they can maybe make better batteries, but how can they explain how dysfunctional society is?
EDIT: We can also directly blame the poor communicational skills of "hard sciences" for diet culture. "hard sciences" have failed in their efforts to produce a population that can reason about nutrition in an evidence-backed manner, and this can be traced directly to how scientists choose to present their data.
You don’t need science to explain how dysfunctional society is. The problem with “soft sciences” is that they can’t produce provably correct information. Incorrect information is worse than no information.
> You don’t need science to explain how dysfunctional society is.
You absolutely do lol, or you're just straight wrong. Take your pick.
> The problem with “soft sciences” is that they can’t produce provably correct information.
Neither can hard sciences. Proofs are incompatible with empiricism. Abductive proofs (which is what the scientific process offers) are necessarily bounded by limited certainty. All you can do is progressively improve certainty approaching 100%, but reaching it is necessarily impossible. This is just basic Hume. You can never be 100% sure the sun will rise tomorrow or that the fundamental laws of physics won't arbitrarily change.
But, the same applies to soft sciences. We can and do increase our certainty continually. This is absolutely worthwhile and is probably far more valuable to humanity than merely modelling physical phenomena.
> an obvious counter-example is an explanation that happens to be coincidentally correct
How can you establish "correctness" without something like the scientific method? How do you even bind loosely-defined english to verifiable claims to real-life referents without agreeing with others on terms? No, you're chosen simple, comforting delusion over anything resembling objective truth.
> This seems like an attack on the notion of objective truth
Yes, truth is an apriori concept; objective truth is a silly delusion. Coherence is generally a much stronger concept anyway.
> "hard sciences" also fail to produce results relevant to most people.
I don't see how you can say this, would you prefer to live 200 years ago before hard sciences had started changing peoples lives? Almost every convenience you see around you exists thanks to hard sciences.
That's an awful lot to put on hard science rather than, say, industrialization—the result of soft sciences just as much or even more so than hard sciences. Plus if you insist on only evaluating material concerns we still need to grapple with soft sciences to figure out why society is so horrible at distributing material goods and services in a rational manner.
My point was not to reject hard sciences so much as to emphasize you can't easily extract the consequences of them in isolation. It's nearly futile to even try. My apologies for poorly articulating this.
What exactly do you think are the foundations of modern psychology? Serious question.
There are tons of non-replicable findings way, way further down the stack than psychology, and those tend to have a lot more relying on them than psychology/sociology studies. If you're upset about scientific validity, consider directing your ire to where problems are more likely to actually hurt people -- the "hard sciences."
Nice ad hominem but I'm none of those things. I work in clinical trials, one of the few areas where we actually do have to know things, and a very good empirical demonstration of exactly how incredibly difficult that is.
I'm curious to hear your perspective on the validity of psychology / psychiatry / sociology as someone adjacent to the field.
I am a hard science maths / data science guy, but unlike a lot of my peers I have a great interest in softer reasoning (philosophy, ethics, political science etc). But I am constantly disappointed by how tainted by ideology psychology and psychiatry feel (and economics, but this is a different discussion).
Do you think that psychology and psychiatry are held to the same rigour as harder sciences and should be considered as valid?
All fields of inquiry are tainted by ideology. Read the history of literally any scientific field ever. The entire system is designed to accept this as fact, because science is done by humans, and to still arrive at the truth nonetheless.
If we take those two observations: a) science is done by humans and b) humans have motivations, obviously the way to arrive at truth is to allow for most things to be wrong most of the time. This is the process by which we've learned every single thing about the universe.
I don't know what you mean by "held to the same rigor." I don't think any psychologist on the planet would tell you we understand psychology as well as we understand basic chemical reactions.
I suspect (or hope) that many professional psychologists are beginning to doubt that data acquired in these contrived laboratory settings can provide a window into actual human behavior at all.
Within large tech companies AI projects (often just features within existing products) are seen as the safe place to be while non-AI projects have higher layoff rates.
I wonder how much of the AI spend is driven by the above situation.
Unit tests actually need to be correct, down to individual characters. Same goes with API calls. The API needs to actually exist.
Contrast that with "high level design, rough outlines". Those can be quite vague and hand-wavy. That's where these fuzzy LLMs shine.
That said, these LLM-based systems are great at writing "change detection" unit tests that offer ~zero value (or negative).