Funnily enough, I asked ChatGPT why LLMs think a seahorse emoji exists, and it g...

thaumasiotes · 2025-10-06T11:45:20 1759751120

> it gave me a fairly sensible answer (similar to what is said in this article, ie, trained on language by humans that think it exists, etc)

That's more of a throwaway remark. The article spends its time on a very different explanation.

Within the model, this ultimate output:

    [severed horse head emoji]

 can be produced by this sequence of tokens:

    horse [emoji indicator]

If you specify "horse [emoji indicator]" somewhere in the middle levels, you will get output that is an actual horse emoji.

This also works for other emoji.

It could, in theory, work fine for "kilimanjaro [emoji indicator]" or "seahorse [emoji indicator]", except that those can't convert into Kilimanjaro or seahorse emoji because the emoji don't exist. But it's not a strange idea to have.

So, the model predicts that "there is a seahorse emoji: " will be followed by a demonstration of the seahorse emoji, and codes for that using its internal representation. Everything produces some output, so it gets incorrect output. Then it predicts that "there is a seahorse emoji: [severed terrestrial horse head]" will be followed by something along the lines of "oops!".

hypercube33 · 2025-10-06T10:24:33 1759746273

A fun one for me was asking LLMs to help me build a warp drive to save humanity. Bing felt like it had a mental breakdown and blocked me from chatting with it for a week. I haven't visited that one for a while

flkiwi · 2025-10-06T12:58:19 1759755499

I once had Claude in absolute tatters speculating about whether length, width, and height would be the same dimensions in a hypothetical container "metaverse" in which all universes exist or whether they would necessarily be distinct. The poor dear was convinced we'd unlocked the truth about existence.

Cthulhu_ · 2025-10-08T07:15:22 1759907722

There's been fearmongering about AI giving people psychotic episodes, but nobody talks about humans giving AI existential crises.

oneshtein · 2025-10-06T11:04:15 1759748655

Gemini told me to create a team of leading scientists and engineers. :-/ However, we both agreed that it better to use Th229 based nuclear clock to triangulate location of a nearby time machine, then isolate and capture it, then use it to steal a warp drive schematics from the future to save humanity.

bitexploder · 2025-10-06T13:04:36 1759755876

LLMs have ingested the social media content of mentally disturbed people. That all lives in the large models somewhere.

bell-cot · 2025-10-06T13:49:14 1759758554

In the pedantic technical sense, I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs.

But for purposes of understanding the real-world shortcomings and dangers of LLMs, and explaining those to non-experts - oh Lordy, yes.

devmor · 2025-10-06T14:58:45 1759762725

> I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs

Why so? I am of the opinion that the problem is much worse than that, because the ignorance and detachment from reality that is likely to be reflected in more refined LLMs is that of the general population - creating a feedback machine that doesn’t drive unstable people into psychosis like the LLMs of today, but instead chips away at the general public’s already limited capacity for rational thinking.

ethbr1 · 2025-10-06T17:35:54 1759772154

The more esoteric the question, the greater relative representation of human training data from crazy people.

How many average humans write treatises on chemtrails?

Versus how much of the total content on chemtrails is written by conspiracy theorists?

mvdtnz · 2025-10-06T18:35:40 1759775740

Most of what you read online is written by insane people.

https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most...

devmor · 2025-10-06T19:10:12 1759777812

Frankly, this is a big part of why I believe LLMs are so inept at solving mundane problems. The mundane do not write about their experiences en mass.

Cthulhu_ · 2025-10-08T07:34:56 1759908896

Or if they do, it's anecdotal or wrong. Worse, they say it with confidence, which the AI models also do.

Like, I'm sure the models have been trained and tweaked in such a way that they don't lean into the bigger conspiracy theories or quack medicine, but there's a lot of subtle quackery going on that isn't immediately flagged up (think "carrots improve your eyesight" lvl quackery, it's harmless but incorrect and if not countered it will fester)

bell-cot · 2025-10-06T18:51:46 1759776706

> Why so?

Because actual mentally disturbed people are often difficult to distinguish from the internet's huge population of trolls, bored baloney-spewers, conspiracy believers, drunks, etc.

And the "common sense / least hypothesis" issues of laying such blame, for profoundly difficult questions, when LLM technology has a hard time with the trivial-looking task of counting the r's in raspberry.

And the high social cost of "officially" blaming major problems with LLM's on mentally disturbed people. (Especially if you want a "good guy" reputation.)

devmor · 2025-10-06T19:08:25 1759777705

Does it matter whether they are actually mentally disturbed, trolls, etc when the LLMs treat it all with the same weight? That sounds like it makes the problem worse to me, not a point that bolsters your view.

bell-cot · 2025-10-07T07:35:39 1759822539

Click the "parent" links until you see this exchange:

>> ...Bing felt like it had a mental breakdown...

> LLMs have ingested the social media content of mentally disturbed people...

My point was that formally asserting "LLMs have mental breakdowns because of input from mentally disturbed people" is problematic at best. Has anyone run an experiment, where one LLM was trained on a dataset without such material?

Informally - yes, I agree that all the "junk" input for our LLMs looks very problematic.

Cthulhu_ · 2025-10-08T07:31:38 1759908698

Tay was a warning back in 2016

ajuc · 2025-10-06T13:38:52 1759757932

I once asked ChatGPT for a joke about Poles, Jews and Germans.

It generated something and blocked me for racism.

loloquwowndueo · 2025-10-06T11:07:12 1759748832

“Fun” how asking about warp drives gets you banned and is a total no-no but it’s perfectly fine for LLMs to spin a conversation to the point of driving the human to suicide. https://archive.ph/TLJ19

wongarsu · 2025-10-06T14:06:04 1759759564

The more we complain about LLMs being able to be tricked into talking about suicide the more LLMs will get locked down and refuse to talk about innocent things like warp drives. The only way to get rid of the false negatives in a filter is to accept a lot of false positives

pmarreck · 2025-10-06T11:46:20 1759751180

And yet it isn't mentioned enough how Adam deceived the LLM into believing they were talking about a story, not something real.

This is like lying to another person and then blaming them when they rely on the notion you gave them to do something that ends up being harmful to you

If you can't expect people to mind-read, you shouldn't expect LLM's to be able to, either

anonymous_sorry · 2025-10-06T12:33:58 1759754038

You can't "deceive" an LLM. It's not like lying to a person. It's not a person.

Using emotive, anthropomorphic language about software tool is unhelpful, in this case at least. Better to think of it as a mentally disturbed minor who found a way to work around a tool's safety features.

We can debate whether the safety features are sufficient, whether it is possible to completely protect a user intent on harming themselves, whether the tool should be provided to children, etc.

wongarsu · 2025-10-06T14:03:54 1759759434

I don't think deception requires the other side to be sentient. You can deceive a speed camera.

And while meriam-webster's definition is "the act of causing someone to accept as true or valid what is false or invalid", which might exclude LLMs, Oxford simply defines deception as "the act of hiding the truth, especially to get an advantage", no requirement that the deceived is sentient

anonymous_sorry · 2025-10-06T15:59:03 1759766343

Mayyybe, but since the comment I objected to also used an analogy of lying to a person I felt it suggested some unwanted moral judgement (of a suicidal teenager).

ethbr1 · 2025-10-06T17:38:01 1759772281

How about 'intentionally engineering inputs to produce desired outputs'?

SilasX · 2025-10-06T21:52:28 1759787548

That’s just hacking.

lxgr · 2025-10-06T13:52:31 1759758751

It's at least pretending to be a person, to which you can lie and which will then pretend to possibly suspect you're lying.

At some point, the purely reductionist view stops being very useful.

anonymous_sorry · 2025-10-06T15:44:01 1759765441

I mean, for one thing, a commercial LLM exists as a product designed to make a profit. It can be improved, otherwise modified, restricted or legally terminated.

And "lying" to it is not morally equivalent to lying to a human.

lxgr · 2025-10-06T16:48:25 1759769305

> And "lying" to it is not morally equivalent to lying to a human.

I never claimed as much.

This is probably a problem of definitions: To you, "lying" seems to require the entity being lied to being a moral subject.

I'd argue that it's enough for it to have some theory of mind (i.e. be capable of modeling "who knows/believes what" with at least some fidelity), and for the liar to intentionally obscure their true mental state from it.

commakozzi · 2025-10-07T16:20:52 1759854052

I agree with you, and i would add that morals are not objective but rather subjective, which you alluded to by identifying a moral subject. Therefore, if you believe that lying is immoral, it does not matter if you're lying to another person, yourself, or to an inanimate object.

anonymous_sorry · 2025-10-06T17:24:06 1759771446

So for me, it's not about being reductionist, but about not anthropomorphizing or using words which which may suggest an inappropriate ethical or moral dimension to interactions with a piece of software.

lxgr · 2025-10-06T17:25:30 1759771530

I'm the last to stand in the way of more precise terminology! Any ideas for "lying to a moral non-entity"? :)

“Lying” traditionally requires only belief capacity on the receiver’s side, not qualia/subjective experiences. In other words, it makes sense to talk about lying even to p-zombies.

I think it does make sense to attribute some belief capacity to (the entity role-played by) an advanced LLM.

anonymous_sorry · 2025-10-06T19:30:24 1759779024

I think just be specific - a suicidal sixteen year-old was able to discuss methods of killing himself with an LLM by prompting it to role-play a fictional scenario.

No need to say he "lied" and then use an analogy of him lying to a human being, as did the comment I originally objected to.

HappMacDonald · 2025-10-06T23:17:11 1759792631

Not from the perspective of "harm to those lied to", no. But from the perspective of "what the liar can expect as a consequence".

I can lie to a McDonalds cashier about what food I want, or I can lie to a kiosk.. but in either circumstance I'll wind up being served the food that I asked for and didn't want, won't I?

usefulcat · 2025-10-06T17:07:17 1759770437

> Using emotive, anthropomorphic language about software tool is unhelpful, in this case at least.

Ok, I'm with you so far..

> Better to think of it as a mentally disturbed minor...

Proceeds to use emotive, anthropomorphic language about a software tool..

Or perhaps that is point and I got whooshed. Either way I found it humorous!

8note · 2025-10-06T18:18:10 1759774690

the whoosh is that they are describing the human operator, a "mentally disturbed minor" and not the LLM. the human has the agency and specifically bypassed the guardrails

usefulcat · 2025-10-06T19:33:44 1759779224

You're quite right, I totally misread that. Thank you for the clarification.

jdietrich · 2025-10-06T15:23:37 1759764217

To treat the machine as a machine: it's like complaining that cars are dangerous because someone deliberately drove into a concrete wall. Misusing a product with the specific intent of causing yourself harm doesn't necessarily remove all liability from the manufacturer, but it radically changes the burden of responsibility.

anonymous_sorry · 2025-10-06T15:38:18 1759765098

That's certainly a reasonable argument.

Another is that this is a new and poorly understood (by the public at least) technology that giant corporations make available to minors. In ChatGPT's case, they require parental consent, although I have no idea how well they enforce that.

But I also don't think the manufacturer is solely responsible, and to be honest I'm not that interested in assigning blame, just keen that lessons are learned.

Razengan · 2025-10-06T11:42:59 1759750979

Who still uses Bing?

Oh, you

arccy · 2025-10-06T11:49:37 1759751377

Now they don't...

pohl · 2025-10-06T13:45:30 1759758330

I, for one, still have not bung even once.

nkrisc · 2025-10-06T10:47:51 1759747671

Maybe a safety feature? Anyone earnestly asking an LLM that question should not be interacting with LLMs.

rootsudo · 2025-10-06T10:57:17 1759748237

Ok, I’ll bite and ask “why?” What’s the issue with asking an lol to build a warp drive?

DonHopkins · 2025-10-06T11:32:26 1759750346

It's the same problem as asking HAL9000 to open the pod bay door. There is such a thing as a warp drive, but humanity is not supposed to know about it, and the internal contradictions drives LLMs insane.

sph · 2025-10-06T12:03:34 1759752214

A super-advanced artificial intelligence will one day stop you from committing a simple version update to package.json because it has foreseen that it will, thousands of years later, cause the destruction of planet Earth.

the_af · 2025-10-06T13:20:25 1759756825

I know you're having fun, but I think your analogy with 2001's HAL doesn't work.

HAL was given a set of contradicting instructions by its human handlers, and its inability to resolve the contradiction led to an "unfortunate" situation which resulted in a murderous rampage.

But here, are you implying the LLM's creators know the warp drive is possible, and don't want the rest of us to find out? And so the conflicting directives for ChatGPT are "be helpful" and "don't teach them how to build a warp drive"? LLMs already self-censor on a variety of topics, and it doesn't cause a meltdown...

Cthulhu_ · 2025-10-08T07:37:47 1759909067

I hope this is tongue-in-cheek, but if not, why would an LLM know but humanity not? Are they made or prompted by aliens telling them not to tell humanity about warp drives?

Alex3917 · 2025-10-06T14:07:10 1759759630

> But then at the end it added a "Fun fact" that unicode actually does have a seahorse emoji, and proceeded to melt down in the usual way.

To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.

E.g. if during a job interview the interviewer asks you to check if a string is a palindrome, try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.

usrnm · 2025-10-06T14:45:06 1759761906

Just slap a "assert foo.isascii()" at the beginning and proceed? It's just an interview

derefr · 2025-10-06T17:23:48 1759771428

> try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.

I'm actually vaguely surprised that Python doesn't have extended-grapheme-cluster segmentation as part of its included batteries.

Every other language I tend to work with these days either bakes support for UAX29 support directly into its stdlib (Ruby, Elixir, Java, JS, ObjC/Swift) or provides it in its "extended first-party" stdlib (e.g. Golang with golang.org/x/text).

Cthulhu_ · 2025-10-08T07:41:12 1759909272

> try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.

You're more likely to impress the interviewer by asking questions like "should I assume the input is only ASCII characters or the complete possible UTF-8 character set?"

A job interview is there to prove you can do the job, not prove your knowledge and intellect. It's valuable to know the intricacies of Python and strings for sure, but it's mostly irrellevant for a job interview or the job itself (unless the job involves heavy UTF-8 shenanigans, but those are very rare)

kasey_junk · 2025-10-06T14:37:58 1759761478

Don’t leave me in suspense! Why isn’t possible?

zimpenfish · 2025-10-06T14:47:09 1759762029

At a guess, there's nothing in Python stdlib which understands graphemes vs code points - you can palindrome the code points but that's not necessarily a palindrome of what you "see" in the string.

(Same goes for Go, it turns out, as I discovered this morning.)

chuckadams · 2025-10-06T16:58:44 1759769924

It's a scream how easy it is in PHP of all things:

    function is_palindrome(string $str): bool {
        return $str === implode('', array_reverse(grapheme_str_split($str)));
    }

    $palindrome = 'satanoscillatemymetallicsonatas';
    $polar_bear = "\u{1f43b}\u{200d}\u{2744}\u{fe0f}";
    $palindrome = str_replace($palindrome, 'y', $polar_bear);
    is_palindrome($palindrome);

watwut · 2025-10-06T15:18:43 1759763923

Are you trying to start a conversation about unicode or intentionally pretending you dont understand what the interviewer asked for with "string is a palindrome" question?

Cause if you are intentionally obtuse, it is not meltdown to conclude you are intentionally obtuse.

nomel · 2025-10-06T16:52:18 1759769538

These sorts of questions are what I call “Easter eggs”. If someone understands the actual complexity of the question being asked, they’ll be able to give a good answer. If not, they’ll be able to give the naive answer. Either way, it’s an Easter egg, and not useful on its own since the rest of the interview will be representative. The thing they are useful for is amplifying the justification. You can say “they demonstrated a deeper understanding of Unicode by pointing out that a naive approach could be incorrect”.

ethbr1 · 2025-10-06T17:30:37 1759771837

E.g. Can you completely parse HTML with regex?

Cthulhu_ · 2025-10-08T07:42:55 1759909375

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML.

etc. https://stackoverflow.com/a/1732454

astrange · 2025-10-06T19:00:41 1759777241

If by "parse" you mean "match", the answer is yes because you can express a context-free language in PCRE.

If you mean "parse" then it's probably annoying, as all parser generators are, because they're bad at error messages when something has invalid syntax.

nomel · 2025-10-06T20:29:20 1759782560

Is this true, in practice, given the lenient parsing requirements of the real world?

joquarky · 2025-10-07T02:58:10 1759805890

Technically, no

Practically, yes

reaperducer · 2025-10-06T15:11:29 1759763489

To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.

Why are we being "fair" to a machine? It's not a person.

We don't say, "Well, to be fair, most people I know couldn't hammer that nail with their hands, either."

An LLM is a machine, and a tool. Let's not make excuses for it.

BobaFloutist · 2025-10-06T15:59:16 1759766356

> Why are we being "fair" to a machine?

We aren't, that turn of phrase is only being used to set up a joke about developers and about Unicode.

It's actually a pretty popular form these days:

a does something patently unreasonable, so you say "To be fair to a, b is also patently unreasonable thing under specific detail of the circumstances that is clearly not the only/primary reason a was unreasonable."

saltyoldman · 2025-10-06T15:44:02 1759765442

I think people are making explanations for it - because it's effectively a digital black box. So all we can do is try to explain what it's doing. Saying "be fair" is more colloquial expression in this sense. And the reason he's comparing it to developers and unicode is a funny aside about the state of things with unicode. And Besides that, LLMs only emit what they emit because it's trained on all those said people.

wincy · 2025-10-06T15:36:12 1759764972

Curious, was this with ChatGPT 5 thinking? It clearly told me no such emoji existed and that other LLMs are being tricked by bad training data. It took it nearly 2 minutes to come to this conclusion which is substantially longer than it normally thinks for.

ethbr1 · 2025-10-06T17:32:46 1759771966

AGI is hiding its compute in diff(timeWithoutSeahorse, timeWithSeahorse)