Empires always fall from within. It was inconceivable for a young me to ever think of day when MS Office would be unworkable. Advance couple of decades and MS 365 Copilot is just the thing that just doesn't work. Not because somebody exploited a bug and created unviewable doc, but because MS decided to pile on bugs while leaving old ones in..
> Just don't complain when large corporations copy your work one day with no legal recourse.
To be fair, that is the schadenfreude. Large corporations have been copying works of little people for ages. They only started crying about 'IP theft' when someone bigger (China) started doing the same to them, and to make it worse, most of the corps willingly handed the IP over because they wanted cheap exploitable labor.
Hallucination is all an LLM does. That is their nature, to hallucinate.
We just happen to find some of these hallucinations useful.
Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.
I wish people who take this stance would seriously reconsider their take on how hallucinations are defined and how unhelpful it is to conflate hallucination with generation from a probability distribution. I appreciate OpenAI publishing articles like this because, while the parent comment and I may have to agree to disagree on how hallucinations are defined, I can at least appeal to OpenAI's authority to say that such arguments are not only unhelpful, but also unsound.
You're going to get a lot of pushback on the idea of taking the definition of hallucination seriously. Calling fluently stated bunk "hallucination" feels cynical to begin with. Trying to weave a silk purse out of that sow's ear is difficult.
I don't know what you mean by hallucination here; are you saying that any statistical output is "hallucination"? If so, then we are also constantly hallucinating I guess.
There doesn't seem to be a particularly consistent definition of what "hallucinate" means in the context of LLMs, so let's make one that is in line with the post.
"Hallucination" is when a language model outputs a sequence of tokens comprising a statement (an assertion that is either true or false) that is incorrect. Under this definition, hallucination is clearly not all that an LLM can do.
An easy way to avoid hallucination under this definition is to respond with something that is never a statement when there is a possibility that it can be incorrect; e.g. "I think that... I don't know...". To me, this seems to be what the authors argue. This has always seemed pretty obvious to most people I've spoken to (hell, I've reviewed grant applications from years ago which talk about this), so I'm not sure why it took so long for the "frontier" developers to actually try this.
Uh.. <raises hand> I might be one of the few people who actually knows a bunch of the theory on why wikipedia works (properly). I had to do a bunch of research while working on wikipedia mediation and policies stuff, a long time ago.
I never got around to writing it all out though. Bits of it can be found in old policy discussions on bold-reverse-discuss, consensus, and etc.
I guess the first thing to realize is that wikipedia is split into a lot of pages, and n_editors for most pages in the long tail is very very low, so most definitely below n_dunbar[]; and really can be edited almost the same way wikipeida used to be back in 2002. At the same time a small number of pages above n_dunbar get the most attention and are the most messy to deal with.
Aaron Swartz actually did a bunch of research into some of the base statistics too, and he DID publish stuff online... let me look that up...
As the other comment said, LLMs are not an abstraction.
An abstraction is a deterministic, pure function, than when given A always returns B. This allows the consumer to rely on the abstraction. This reliance frees up the consumer from having to implement the A->B, thus allowing it to move up the ladder.
LLMs, by their very nature are probabilistic. Probabilistic is NOT deterministic. Which means the consumer is never really sure if given A the returned value is B. Which means the consumer now has to check if the returned value is actually B, and depending on how complex A->B transformation is, the checking function is equivalent in complexity as implementing the said abstraction in the first place.
We can use different words if you like (and I'm not convinced that delegation isn't colloquially a form of abstraction) but you can't control the world by controlling the categories.
I like that. To paraphrase the Steinbeck (mis)quote: "Hacker culture never took root in the AI gold rush because the LLM 'coders' saw themselves not as hackers and explorers, but as temporarily understaffed middle-managers."
Except that (1) the other party doesn't become smart, (2) the one who delegates doesn't become stupid, it just loses the opportunity to become smarter when compared to a human who'd actually do the work.
The evidence it cites is that paper from 3 months ago claiming your brain activates less while prompting than actually writing an essay.
No duh, the point is that you flex your mental muscles on the tasks AI can't do, like effective organization. I don't need to make a pencil to write.
The most harmful myth in all of education is the idea that you need to master some basic building blocks in order to move on to a higher level. That really is just a noticeable exception. At best you can claim that it's difficult for other people to realize that your new way solves the problem, or that people should really learn X because it's generally useful.
I don't see the need for this kind of compulsory education, and it's doing much more harm than good. Bodybuilding doesn't even appear as a codified sport until well after the industrial revolution, it's not until we are free of sustenance labor that human intelligence will peak. Who would be happy with a crummy essay if humans could learn telekinesis?
That's a lot of words filled with straw man analogies. Essentially, you're claiming that you can strengthen your cognitive skills by having LLMs do all the thinking for you, which is absurd. And the fact that the study is 3 months old doesn't invalidate the work.
> Who would be happy with a crummy essay if humans could learn telekinesis?
I'm glad that's not the professional consensus on education, at least for now. And "telekinesis," really?
> No duh, the point is that you flex your mental muscles on the tasks AI can't do, like effective organization.
AI can do better organization than you, it's only inertia and legalities that prevent it from happening. See, without good education, you aren't even able to find a place for yourself.
> The most harmful myth in all of education is the idea that you need to master some basic building blocks in order to move on to a higher level.
That "myth" is supported by abundant empirical evidence, people have tried education without it and it didn't work. My lying eyes kind of confirm it too, I had one hell of time trying to use LLM without getting dumber... it comes so natural to them, skipping steps is seductive but blinding.
> I don't see the need for this kind of compulsory education, and it's doing much more harm than good.
Again, long standing empirical evidence tells as the opposite. I support optional education but we can't even have a double blind study for it - I'm pretty sure those who don't go to school would be home-schooled, too few are dumb enough to let their uneducated children chose their manner and level of education.
well, then it comes down to which skillset is more marketable - the delegator, or the codong language expert.
customers dont care about the syntactic sugar/advanced reflection in the codebase of the product that theyre buying. if the end product of the delegator and the expert is the same, employers will go with the faster one every time.
That's how you end up in the Idiocracy world, where things still happen, but they are driven by ads rather than actual need, no one really understands how anything works, somehow society plods along due to momentum, but it's all shit from top to bottom and nothing is getting better. "Brawndo: it's got what plants crave!" is the end result of being lead around by marketers.
Rather a subset of people who would like to believe the results don't apply to them.
Frankly, I'm sure there will be much more studies in this direction. Now this is a university, an independent organization. But, given the amount of money involved, some of future studies will come from the camp vitally interested in people believing that by outsourcing their work to coding agents they are becoming smarter instead of losing achieved skills. Looking forward to reading the first of these.
Outsourcing work doesn't make you smarter. It makes you more productive. It gives you extra time that you can dedicate towards becoming smarter at something else.
Become smarter at what exactly? People reliant on AI aren't going to use AI on just one thing, they're going to use it for everything. Besides, as others have pointed out to you, the study shows evidence that AI reliance causes cognitive decline. It affects your general intelligence, not limited to a single area of expertise.
> Students who repeatedly relied on ChatGPT showed weakened neural connectivity, impaired memory recall, and diminished sense of ownership over their own writing
So we're going to have more bosses, perhaps not in title, who think they're becoming more knowledgeable about a broad range of topics, but are actually in cognitive decline and out of touch with reality on the ground. Great.
One argument for abstraction being different from delegation, is when a programmer uses an abstraction, I'd expect the programmer to be able to work without the abstraction, if necessary, and also be able to build their own abstractions. I wouldn't have that expectation with delegation.
> The vast majority of programmers don't know assembly, so can in fact not work without all the abstractions they rely on.
The problem with this analogy is obvious when you imagine an assembler generating machine code that doesn't work half of the time and a human trying to correct that.
I mean, it’s more like 0.1% of the time but I’ve definitely had to do this in embedded programming on ARM Cortex M0-M3. Sometimes things just didn't compile the way I expected. My favorite was when I smashed the stack and I overflowed ADC readings into the PC and SP, leading to the MCU jumping completely randomly all over the codebase. Other times it was more subtle things, like optimizing away some operation that I needed to not be optimized away.
> Do you therefore argue programming languages aren't abstractions?
Yes, and no.
They’re abstractions in the sense of hiding the implementation details of the underlying assembly. Similarly, assembly hides the implementation details of the cpu, memory, and other hw components.
However, except with programming languages you don’t need to know the details of the underlying layers except for very rare cases. The abstraction that programming languages provide is simple, deterministic, and well documented. So, in 99.999% of cases, you can reason based on the guarantees of the language, regardless of how those guarantees are provided.
With LLMs, the relation between input and output is much more loose. The output is non-deterministic, and tiny changes to the input can create enormous changes in the output seemingly without reason. It’s much shakier ground to build on.
I do not think determinism of behaviour is the only thing that matters for evaluating the value of an abstraction - exposure to the output is also a consideration.
The behaviour of the = operator in Python is certainly deterministic and well-documented, but depending on context it can result in either a copy (2x memory consumption) or a pointer (+64bit memory consumption). Values that were previously pointers can also suddenly become copies following later permutation. Do you think this through every time you use =? The consequences of this can be significant (e.g. operating on a large file in memory); I have seen SWEs make errors in FastAPI multipart upload pipelines that have increased memory consumption by 2x, 3x, in this manner.
Meanwhile I can ask an LLM to generate me Rust code, and it is clearly obvious what impact the generated code has on memory consumption. If it is a reassignment (b = a) it will be a move, and future attempts to access the value of a would refuse to compile and be highlighted immediately in an IDE linter. If the LLM does b = &a, it is clearly borrowing, which has the size of a pointer (+64bits). If the LLM did b = a.clone(), I would clearly be able to see that we are duplicating this data structure in memory (2x consumption).
The LLM code certainly is non-deterministic; it will be different depending on the questions I asked (unlike a compiler). However, in this particular example, the chosen output format/language (Rust) directly exposes me to the underlying behaviour in a way that is both lower-level than Python (what I might choose to write quick code myself) yet also much, much more interpretable as a human than, say, a binary that GCC produces. I think this has significant value.
Unrelated to the gp post, but isn't LLMs more like a deterministic chaotic system than a "non-deterministic" one? "Tiny changes to the input can change the output quite a lot" is similar to "extreme sensitivity to initial condition" property of a chaotic system.
I guess that could be a problematic behavior if you want reproducibility ala (relatively) reproducible abstraction like compilers. With LLMs, there are too many uncontrollable variables to precisely reproduce a result from the same input.
The vast majority of programmers could learn assembly, most of it in a day. They don’t need to, because the abstractions that generate it are deterministic.
This is a tautology. At some level, nobody can work at a lower level of abstraction. A programmer who knows assembly probably could not physically build the machine it runs on. A programmer who could do that probably could not smelt the metals required to make that machine. etc.
However, the specific discussion here is about delegating the work of writing to an LLM, vs abstracting the work of writing via deterministic systems like libraries, frameworks, modules, etc. It is specifically not about abstracting the work of compiling, constructing, or smelting.
This is meaningless. An LLM is also deterministic if configured to be so, and any library, framework, module can be non-deterministic if built to be. It's not a distinguishing factor.
They are probabilistic. Running them on even different hardware yields different results. And the deltas compound the longer your context and the more tokens you're using (like when writing code).
But more importantly, always selecting the most likely token traps the LLM in loops, reduces overall quality, and is infeasible at scale.
There are reasons that literally no LLM that you use runs deterministically.
With temperature set to zero, they are deterministic if inference is implemented with deterministic calculations.
Only when you turn the temperature up they become probabilistic for a given input in that case. If you take shortcuts in implementing the inference, then sure, rounding errors may accumulate and prevent that, but that is not an issue with the models but with your choice of how to implement the inference.
To address your specific point in the same way: When we're talking about programmers using abstractions, we're usually not talking about the programming language their using, we're talking about the UI framework, networking libraries, etc... they're using. Those are the APIs their calling with their code, and those are all abstractions that are all implemented at (roughly) the same level of abstraction as the programmer's day-to-day work. I'd expect a programmer to be able to re-implement those if necessary.
> I wouldn't have that expectation with delegation.
Managers tend to hire sub managers to manage their people. You can see this with LLM as well, people see "Oh this prompting is a lot of work, lets make the LLM prompt the LLM".
Note, I'm not saying there are never situations where you'd delegate something that you can do yourself (the whole concept of apprenticeship is based on doing just that). Just that it's not an expectation, e.g., you don't expect a CEO to be able to do the CTO's job.
I guess I'm not 100% sure I agree with my original point though, should a programmer working on JavaScript for a website's frontend be able to implement a browser engine. Probably not, but the original point I was trying to make is I would expect a programmer working on a browser engine to be able to re-implement any abstractions that they're using in their day-to-day work if necessary.
The advice I've seen with delegation is the exact opposite. Specifically: you can't delegate what you can't do.
Partially because of all else fails, you'll need to step in and do the thing. Partially because if you can't do it, you can't evaluate whether it's being done properly.
That's not to say you need to be _as good_ at the task as the delegee, but you need to be competent.
For example, this HBR article [1]. Pervasive in all advice about delegation is the assumption that you can do the task being delegated, but that you shouldn't.
> Just that it's not an expectation, e.g., you don't expect a CEO to be able to do the CTO's job.
I think the CEO role is actually the outlier here.
I can only speak to engineering, but my understanding has always been that VPs need to be able to manage individual teams, and engineering managers need to be somewhat competent if there's some dev work that needs to be done.
This only happens as necessary, and it obviously should be rare. But you get in trouble real quickly if you try to delegate things you cannot accomplish yourself.
I think what you're trying to reference is APIs or libraries, most of which I wouldn't consider abstractions. I would hope most senior front-end developers are capable of developing a date library for their use case, but in almost all cases it's better to use the built in Date class, moment, etc. But that's not an abstraction.
There is a form of delegation that develops the people involved, so that people can continue to contribute and grow. Each individual can contribute what is unique to them, and grow more capable as they do so. Both people, and the community of those people remain alive, lively, and continue to grow. Some people call this paradigm “regenerative”; only living systems regenerate.
There is another form of delegation where the work needed to be done is imposed onto another, in order to exploit and extract value. We are trying to do this with LLMs now, but we also did this during the Industrial Revolution, and before that, humanity enslaved each other to get the labor to extract value out of the land. This value extraction leads to degeneration, something that happens when living systems dies.
While the Industrial Revolution afforded humanity a middle-class, and appeared to distribute the wealth that came about — resulting in better standards of living — it came along with numerous ills that as a society, we still have not really figured out.
I think that, collectively, we figure that the LLMs can do the things no one wants to do, and so _everyone_ can enjoy a better standard of living. I think doing it this way, though, leads to a life without purpose or meaning. I am not at all convinced that LLMs are going to give us back that time … not unless we figure out how to develop AIs that help grow humans instead of replacing them.
LLM's and AI in general is just a hack to reimplement slavery with an artificial being that is denied consideration as a being. Technical chattel, if you will, and if you've been paying attention in tech circles a lot of mental energy is being funneled into keeping the egghead's attention firmly in the "we don't want something that is" direction. Investors want robots that won't/can't say no.
What's interesting about this proposition, is that by the time you create a machine that's as capable in the way they want to replace humans, we'll have to start talking about robot personhood, because by then they will be indistinguishable from us.
I don't think you can get the kinds of robots they want without also inventing the artificial equivalent of soul. So their whole moral sidestep to reimplement slavery won't even work. Enslaving sapient beings is evil whether they are made of meat or metal.
You are far too optimistic in terms of willingness of the moneyed to let something like a toaster having theoretical feelings get in the way of their Santa Claus machines.
Human developers by their very nature are probabilistic. Probabilistic is NOT deterministic. Which means the manager is never really sure if the developer solved the problem, or if they introduced some bugs, or if their solution is robust and ideal even when it seems to be working.
All of which is beside the point, because soon-ish LLMs are going to develop their own equivalents of experimentation, formalisation of knowledge, and collective memory, and then solutions will become standardised and replicable - likely with a paradoxical combination of a huge loss of complexity and solution spaces that are humanly incomprehensible.
The arguments here are like watching carpenters arguing that a steam engine can't possibly build a table as well as they can.
Which - is you know - true. But that wasn't how industrialisation worked out.
So it's a noisy abstraction. Programmers deal with that all the time. Whenever you bring in an outside library or dependency there's an implicit contract that you don't have to look underneath the abstraction. But it's noisy so sometimes you do.
Colleagues are the same thing. You may abstract business domains and say that something is the job of your colleague, but sometimes that abstraction breaks.
Still good enough to draw boxes and arrows around.
Noisy is an understatement, it's buggy, it's error filled, it's time consuming and inefficient. It's exact opposite of automation but great for job security.
It's unfortunately not great for job security either. Do you know how Google massively underinvests in support? Their support is mostly automated and is only good at shooing people away. Many companies would jump at the opportunity to adopt AI and accept massive declines in quality as long as it results in cost savings. Working people and customers will get screwed hard.
No, but depending on your governance structure, we have software engineers abstract over domains. And then we draw boxes and arrows around the works of your colleagues without looking inside the box.
You wish! Bus factor risk is why you don’t do this. Having siloed knowledge is one of the first steps towards engineering, unless someone else code is proven bug free, you don’t usually rely on that. You just have someone to throw bug tickets at.
Very true, my brain is stuck in scaling out from small teams. In that world, you can't help but accept plenty of bus factor, and once you get to enough people making sure everyone understands each others' domains is a bit too much.
Yeah but in spite of that if you ask me take a Jira ticket and do it properly, there is a much higher chance that I'll do it reliably and the rest of my team will be satisfied, whereas if I bring an LLM into the equation it will wreak havoc (I've witnessed a few cases and some people got fired, not really for using LLMs but for not reviewing their output properly - which I can even understand somehow as reviewing code is much less fun than creating it).
Yeah and the people paying other people to write code won't understand how the code works. AI as currently deployed stands a strong chance of reducing the ranks of the next generation of talented devs.
> An abstraction is a deterministic, pure function
That must be why we talk about leaky abstractions so much.
They're neither pure functions, nor are they always deterministic. We as a profession have been spoilt by mostly deterministic code (and even then, we had a chunk of probabilistic algorithms, depending on where you worked).
Heck, I've worked with compilers that used simulated annealing for optimization, 2 decades ago.
Yes, it's a sea change for CRUD/SaaS land. But there are plenty of folks outside of that who actually took the "engineering" part of software engineering seriously, and understand just fine how to deal with probabilistic processes and risk management.
I believe that if you can tweak the temperature input (OpenAI recently turned it off in their API, I noticed), an input of 0 should hypothetically result in the same output, given the same input.
The point is he said "by its nature". A transformer based LLM when called with the same inputs/seed/etc is literally the textbook definition of a deterministic system.
This couldn't be any more wrong. LLMs are 100% deterministic. You just don't observe that feature because you're renting it from some cloud service. Run it on your own hardware with a consistent seed, and it will return the same answer to the same prompt every time.
I think 'chaotic' is a better descriptor than 'probabilistic'. It certainly follows deterministic rules, unless randomness is deliberately injected. But the interaction of the rules and the context the operate in is so convoluted that you can't trace an exact causal relationship between the input and output.
It's chaotic in general. The randomness makes it chaotic and nondeterministic. Chaotic systems aren't that bad to work with as long as they are deterministic. Chaotic + nondeterministic is like building on quicksand.
Ok, let's call it a stochastic transformation over abstraction spaces. It's basically sampling from the set of deterministic transformations given the priors established by the prompt.
You're bending over backwards to imply that it's deterministic without saying it is. It's not. LLMs, by its very nature, don't have a well-defined relationship between its input and output. It makes tons of mistakes that's utterly incomprehensible because of that.
> LLMs, by their very nature are probabilistic. Probabilistic is NOT deterministic.
Although I'm on the side of getting my hands dirty, I'm not sure if the difference is that different. A modern compiler embeds a considerable degree of probabilistic behaviour.
Compilers use heuristics which may result in dramatically different results between compiler passes. Different timing effects during compilation may constrain certain optimization passes (e.g. "run algorithm x over the nodes and optimize for y seconds") but in the end the result should still not modify defined observable behavior, modulo runtime. I consider that to be dramatically different than the probabilistic behavior we get from an LLM.
There are pragmas you can give to a compiler to tell it to "expect that this code path is (almost) never followed". I.e. if you have an assert on nullptr, for example. You want it to assume the assert rarely gets triggered, and highly optimize instruction scheduling / memory access for the "not nullptr" case, but still assert (even if it's really, REALLY slow, relatively speaking) to handle the nullptr case.
It’s not that they embed probabilistic behavior per se. But more like they are chaotic systems, in that a slight change of input can drastically change the output. But ideally, good compiler design is idempotent — given the same input, the output should always be the same. If that were not generally true, programming would be much harder than it is.
So are compilers, but people still successfully use them. Compilers and LLMs can both be made deterministic but for performance reasons it's convenient to give up that guarantee.
AIUI, if you made an LLM deterministic, every mostly-similar prompt would return the same result (i.e. access the same training data set) and if that's wrong, the LLM is just plain broken for that example. Hacked-in "temperature" (randomness) is the only way to hopefully get a correct result - eventually.
For example looping over the files in a directory can happen in a different order depending on the order the files were created in. If you are linking a bunch of objects the order typically matters. If the compiler is implemented correctly the resulting binary should functionally be the same but the binary itself may not be exactly the same. Or even when implemented correctly you will see cases where different objects can be the one to define a duplicate symbol depending on their relative order.
That's not nondeterminism though, you've changed the input (the order of the files). Nondeterminism would be if the binary changes despite the files being in the same order. If the binary is the same holding fixed the order of the files, then the output is deterministic.
This is such a funny example because language is the main way that we communicate with LLMs. Which means you can make tie both of your points together in the same example: If you take a scene and describe it in words, then have an LLM reconstruct the scene from the description, you'd likely get a scene that looks very different then the original source. This simultaneous makes both your point and the person you're responding to's point:
1. Language is an abstraction and it's not deterministic (it's really lossy)
2. LLMs behave differently than the abstractions involved in building software, where normally if you gave the same input, you'd expect the same output.
Yes, most abstractions are not as clean as leak free functional abstractions. Most abstractions in the world are leaky and lossy. Abstraction was around long before computers were invented.
Okay, language was the original vehicle for abstraction if everyone wants to get pedantic about it. And yes, abstraction of thought. Only in computerland (programming, mathematics and physics) do you even have the opportunity to have leak-free functional abstractions. That is not the norm. LLM-like leaky abstractions are the norm.
This is clearly not true. For example, the Pythagorean theorem is an old, completely leak free, abstraction with no computer required.
Sorry for being pedantic, I was just curious what you mean at all. Language as abstraction of thought implies that thought is always somehow more "general" than language, right? But if that was the case, how could I read a novel that brings me to tears? Is not my thought in this case more the "lossy abstraction" of the language than the other way around?
Or, what is the abstraction of the "STOP" on the stop sign at the intersection?
simply staying on stable channel makes this a non-problem :)
Besides, Nix is even better for such breakages. If GCC breaks you packages, the system does not build and never gets into broken state, all the while old system remains available and kicking.
They charge €0.50 per month to add an IPv4 address. A shared IPv4 NAT gateway introduces a whole lot of problems for them just to support customers who need IPv4 but don't want to pay a tiny amount for it.
How would a server-side NAT know which Hetzner customer it should route a request to? It has an encrypted packet arriving at this shared address on port 443. You can route a shared address to the proper service based on the HTTP Host header but that can only be done by the customer using their encryption key, so no sharing an address between customers. Home LAN NAT only works because the router can change the source port used by the request so that responses are unambiguously routed to the right client.
I don't think they're saying they should support incoming connections on such a NAT, I think they're saying that servers behind the NAT would be able to make outgoing connections (e.g. to access shared resources).
The morons who want to replace developers with these so called tools already feel like developers need babysitting. Partly because they suck at giving requirements or sticking to them or both. It will be fun to see them finding out the devs do more that bashing code.
That phrase is chilling, and perfectly describes what I've been feeling like where the society at large is heading.
Thank you for introducing it to me.
reply