Funnily enough, I asked ChatGPT why LLMs think a seahorse emoji exists, and it gave me a fairly sensible answer (similar to what is said in this article, ie, trained on language by humans that think it exists, etc). But then at the end it added a "Fun fact" that unicode actually does have a seahorse emoji, and proceeded to melt down in the usual way.
> it gave me a fairly sensible answer (similar to what is said in this article, ie, trained on language by humans that think it exists, etc)
That's more of a throwaway remark. The article spends its time on a very different explanation.
Within the model, this ultimate output:
[severed horse head emoji]
can be produced by this sequence of tokens:
horse [emoji indicator]
If you specify "horse [emoji indicator]" somewhere in the middle levels, you will get output that is an actual horse emoji.
This also works for other emoji.
It could, in theory, work fine for "kilimanjaro [emoji indicator]" or "seahorse [emoji indicator]", except that those can't convert into Kilimanjaro or seahorse emoji because the emoji don't exist. But it's not a strange idea to have.
So, the model predicts that "there is a seahorse emoji: " will be followed by a demonstration of the seahorse emoji, and codes for that using its internal representation. Everything produces some output, so it gets incorrect output. Then it predicts that "there is a seahorse emoji: [severed terrestrial horse head]" will be followed by something along the lines of "oops!".
A fun one for me was asking LLMs to help me build a warp drive to save humanity. Bing felt like it had a mental breakdown and blocked me from chatting with it for a week. I haven't visited that one for a while
I once had Claude in absolute tatters speculating about whether length, width, and height would be the same dimensions in a hypothetical container "metaverse" in which all universes exist or whether they would necessarily be distinct. The poor dear was convinced we'd unlocked the truth about existence.
Gemini told me to create a team of leading scientists and engineers. :-/
However, we both agreed that it better to use Th229 based nuclear clock to triangulate location of a nearby time machine, then isolate and capture it, then use it to steal a warp drive schematics from the future to save humanity.
> I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs
Why so? I am of the opinion that the problem is much worse than that, because the ignorance and detachment from reality that is likely to be reflected in more refined LLMs is that of the general population - creating a feedback machine that doesn’t drive unstable people into psychosis like the LLMs of today, but instead chips away at the general public’s already limited capacity for rational thinking.
Or if they do, it's anecdotal or wrong. Worse, they say it with confidence, which the AI models also do.
Like, I'm sure the models have been trained and tweaked in such a way that they don't lean into the bigger conspiracy theories or quack medicine, but there's a lot of subtle quackery going on that isn't immediately flagged up (think "carrots improve your eyesight" lvl quackery, it's harmless but incorrect and if not countered it will fester)
Because actual mentally disturbed people are often difficult to distinguish from the internet's huge population of trolls, bored baloney-spewers, conspiracy believers, drunks, etc.
And the "common sense / least hypothesis" issues of laying such blame, for profoundly difficult questions, when LLM technology has a hard time with the trivial-looking task of counting the r's in raspberry.
And the high social cost of "officially" blaming major problems with LLM's on mentally disturbed people. (Especially if you want a "good guy" reputation.)
Does it matter whether they are actually mentally disturbed, trolls, etc when the LLMs treat it all with the same weight? That sounds like it makes the problem worse to me, not a point that bolsters your view.
Click the "parent" links until you see this exchange:
>> ...Bing felt like it had a mental breakdown...
> LLMs have ingested the social media content of mentally disturbed people...
My point was that formally asserting "LLMs have mental breakdowns because of input from mentally disturbed people" is problematic at best. Has anyone run an experiment, where one LLM was trained on a dataset without such material?
Informally - yes, I agree that all the "junk" input for our LLMs looks very problematic.
“Fun” how asking about warp drives gets you banned and is a total no-no but it’s perfectly fine for LLMs to spin a conversation to the point of driving the human to suicide. https://archive.ph/TLJ19
The more we complain about LLMs being able to be tricked into talking about suicide the more LLMs will get locked down and refuse to talk about innocent things like warp drives. The only way to get rid of the false negatives in a filter is to accept a lot of false positives
And yet it isn't mentioned enough how Adam deceived the LLM into believing they were talking about a story, not something real.
This is like lying to another person and then blaming them when they rely on the notion you gave them to do something that ends up being harmful to you
If you can't expect people to mind-read, you shouldn't expect LLM's to be able to, either
You can't "deceive" an LLM. It's not like lying to a person. It's not a person.
Using emotive, anthropomorphic language about software tool is unhelpful, in this case at least. Better to think of it as a mentally disturbed minor who found a way to work around a tool's safety features.
We can debate whether the safety features are sufficient, whether it is possible to completely protect a user intent on harming themselves, whether the tool should be provided to children, etc.
I don't think deception requires the other side to be sentient. You can deceive a speed camera.
And while meriam-webster's definition is "the act of causing someone to accept as true or valid what is false or invalid", which might exclude LLMs, Oxford simply defines deception as "the act of hiding the truth, especially to get an advantage", no requirement that the deceived is sentient
Mayyybe, but since the comment I objected to also used an analogy of lying to a person I felt it suggested some unwanted moral judgement (of a suicidal teenager).
I mean, for one thing, a commercial LLM exists as a product designed to make a profit. It can be improved, otherwise modified, restricted or legally terminated.
And "lying" to it is not morally equivalent to lying to a human.
> And "lying" to it is not morally equivalent to lying to a human.
I never claimed as much.
This is probably a problem of definitions: To you, "lying" seems to require the entity being lied to being a moral subject.
I'd argue that it's enough for it to have some theory of mind (i.e. be capable of modeling "who knows/believes what" with at least some fidelity), and for the liar to intentionally obscure their true mental state from it.
I agree with you, and i would add that morals are not objective but rather subjective, which you alluded to by identifying a moral subject. Therefore, if you believe that lying is immoral, it does not matter if you're lying to another person, yourself, or to an inanimate object.
So for me, it's not about being reductionist, but about not anthropomorphizing or using words which which may suggest an inappropriate ethical or moral dimension to interactions with a piece of software.
I'm the last to stand in the way of more precise terminology! Any ideas for "lying to a moral non-entity"? :)
“Lying” traditionally requires only belief capacity on the receiver’s side, not qualia/subjective experiences. In other words, it makes sense to talk about lying even to p-zombies.
I think it does make sense to attribute some belief capacity to (the entity role-played by) an advanced LLM.
I think just be specific - a suicidal sixteen year-old was able to discuss methods of killing himself with an LLM by prompting it to role-play a fictional scenario.
No need to say he "lied" and then use an analogy of him lying to a human being, as did the comment I originally objected to.
Not from the perspective of "harm to those lied to", no. But from the perspective of "what the liar can expect as a consequence".
I can lie to a McDonalds cashier about what food I want, or I can lie to a kiosk.. but in either circumstance I'll wind up being served the food that I asked for and didn't want, won't I?
the whoosh is that they are describing the human operator, a "mentally disturbed minor" and not the LLM. the human has the agency and specifically bypassed the guardrails
To treat the machine as a machine: it's like complaining that cars are dangerous because someone deliberately drove into a concrete wall. Misusing a product with the specific intent of causing yourself harm doesn't necessarily remove all liability from the manufacturer, but it radically changes the burden of responsibility.
Another is that this is a new and poorly understood (by the public at least) technology that giant corporations make available to minors. In ChatGPT's case, they require parental consent, although I have no idea how well they enforce that.
But I also don't think the manufacturer is solely responsible, and to be honest I'm not that interested in assigning blame, just keen that lessons are learned.
It's the same problem as asking HAL9000 to open the pod bay door. There is such a thing as a warp drive, but humanity is not supposed to know about it, and the internal contradictions drives LLMs insane.
A super-advanced artificial intelligence will one day stop you from committing a simple version update to package.json because it has foreseen that it will, thousands of years later, cause the destruction of planet Earth.
I know you're having fun, but I think your analogy with 2001's HAL doesn't work.
HAL was given a set of contradicting instructions by its human handlers, and its inability to resolve the contradiction led to an "unfortunate" situation which resulted in a murderous rampage.
But here, are you implying the LLM's creators know the warp drive is possible, and don't want the rest of us to find out? And so the conflicting directives for ChatGPT are "be helpful" and "don't teach them how to build a warp drive"? LLMs already self-censor on a variety of topics, and it doesn't cause a meltdown...
I hope this is tongue-in-cheek, but if not, why would an LLM know but humanity not? Are they made or prompted by aliens telling them not to tell humanity about warp drives?
> But then at the end it added a "Fun fact" that unicode actually does have a seahorse emoji, and proceeded to melt down in the usual way.
To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.
E.g. if during a job interview the interviewer asks you to check if a string is a palindrome, try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.
> try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.
I'm actually vaguely surprised that Python doesn't have extended-grapheme-cluster segmentation as part of its included batteries.
Every other language I tend to work with these days either bakes support for UAX29 support directly into its stdlib (Ruby, Elixir, Java, JS, ObjC/Swift) or provides it in its "extended first-party" stdlib (e.g. Golang with golang.org/x/text).
> try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.
You're more likely to impress the interviewer by asking questions like "should I assume the input is only ASCII characters or the complete possible UTF-8 character set?"
A job interview is there to prove you can do the job, not prove your knowledge and intellect. It's valuable to know the intricacies of Python and strings for sure, but it's mostly irrellevant for a job interview or the job itself (unless the job involves heavy UTF-8 shenanigans, but those are very rare)
At a guess, there's nothing in Python stdlib which understands graphemes vs code points - you can palindrome the code points but that's not necessarily a palindrome of what you "see" in the string.
(Same goes for Go, it turns out, as I discovered this morning.)
Are you trying to start a conversation about unicode or intentionally pretending you dont understand what the interviewer asked for with "string is a palindrome" question?
Cause if you are intentionally obtuse, it is not meltdown to conclude you are intentionally obtuse.
These sorts of questions are what I call “Easter eggs”. If someone understands the actual complexity of the question being asked, they’ll be able to give a good answer. If not, they’ll be able to give the naive answer. Either way, it’s an Easter egg, and not useful on its own since the rest of the interview will be representative. The thing they are useful for is amplifying the justification. You can say “they demonstrated a deeper understanding of Unicode by pointing out that a naive approach could be incorrect”.
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML.
If by "parse" you mean "match", the answer is yes because you can express a context-free language in PCRE.
If you mean "parse" then it's probably annoying, as all parser generators are, because they're bad at error messages when something has invalid syntax.
We aren't, that turn of phrase is only being used to set up a joke about developers and about Unicode.
It's actually a pretty popular form these days:
a does something patently unreasonable, so you say "To be fair to a, b is also patently unreasonable thing under specific detail of the circumstances that is clearly not the only/primary reason a was unreasonable."
I think people are making explanations for it - because it's effectively a digital black box. So all we can do is try to explain what it's doing. Saying "be fair" is more colloquial expression in this sense. And the reason he's comparing it to developers and unicode is a funny aside about the state of things with unicode. And Besides that, LLMs only emit what they emit because it's trained on all those said people.
Curious, was this with ChatGPT 5 thinking? It clearly told me no such emoji existed and that other LLMs are being tricked by bad training data. It took it nearly 2 minutes to come to this conclusion which is substantially longer than it normally thinks for.