This should not be surprising. People are fooled by phishing emails all of the t...

elicksaur · on Aug 5, 2024

The “Humans on average are bad at X, therefore it isn’t a problem that AI makes mistakes.” argument is getting tired. No, I’ve never personally fallen for a phishing scam. If my email marked a phishing email as “Priority”, I would be much more likely to fall for it.

This seems bad.

acdha · on Aug 5, 2024

I wasn’t sure whether they intended that as exoneration or explanation but I think it should be the latter. The right way to think about LLMs is as a credulous hire with no job history: McDonald’s will hire a random 16 year old but only to work in a controlled environment with oversight, not to open the CEO’s mail and tell them what’s important.

barrell · on Aug 5, 2024

I would shy away from any comparisons to a human. The right way to think about LLMs is like generative fill for text or reverse text summarization imho

consteval · on Aug 5, 2024

I agree. I think as soon as you refer to AIs as what they are, computer programs, a lot of problems and solutions prevent themselves.

For example, why are some people trying to give rights to computer programs? Since when have computer programs had rights? Fair use doctrine, for example, is a right for human beings.

acdha · on Aug 5, 2024

Fair, I’m really thinking about it in response to people pushing products with terminology we normally use for people but the comparison really is tricky since this is our first collective experience with something which can sound authoritative without any deeper understanding and historically many people have used one as the proxy for the other.

ryandrake · on Aug 5, 2024

It is bad. Like it or not, the general public considers computers to be deterministic calculating machines that produce the correct result. And when this doesn't happen, we expect the software developer to treat the case as a defect and work to correct it. Now, we have people telling us, no, computers are not calculation tools, they are more like an overconfident 14-year-old Redditor, and any mistakes they make are not defects, but unavoidable limitations of AI and you should expect them.

SkyBelow · on Aug 5, 2024

The argument, in isolate, seems fine to me.

My problem is that it conflicts with how it is being deployed and being trusted. People trust computer systems far beyond what trust they deserve, because they are use to some critical systems being made significantly resistant and most of the others as not having any significant problems. This logic is already a threat when it applied to standard applications built where a programmer, in theory, should have understood each part, at least while building it. This logic works much worse when applied to AI, yet AI are being sold using the common faith that people have in computer systems to give it more responsibility than it can rightly claim given its error rates and the faith people have.

I think the solution is to teach people to doubt expert systems, which will greatly harm their usefulness, but trust should be earned by these systems, on a system by system basis, and they don't deserve the level of trust they currently enjoy.

bradknowles · on Aug 9, 2024

There are people who have fallen for a phishing e-mail, and those who have not yet knowingly fallen for a phishing e-mail.

No one is perfect. No one is invulnerable. All human beings are fallible.

Given the current state of LLMs and Generative AI systems, I submit that we should not be surprised that the same is true of them.

potatoman22 · on Aug 5, 2024

We'd be most likely to fall for phishing scams if there was no system filtering out bad emails. No model is perfect, but some are useful.

elicksaur · on Aug 5, 2024

Also a false equivalency. Current state-of-the-art is not “no spam filter”.

When AI is useful, it won’t be a debate.

potatoman22 · on Aug 5, 2024

I don't think I made a false equivalency. What I'm trying to say is AI can't be perfect or always produce state of the art results. Bad outcomes can always occur, so we should remain vigilant, but not let perfection be the enemy of useful.

II2II · on Aug 6, 2024

It's not so much that it is acceptable for filters to misclassify email as we should keep our expectations realistic.

Then again, I've never been one to treat the priority email folder with credulity. What it classifies as priority is often quite different from what I would. Never mind treating those emails as inherently legitimate.

latexr · on Aug 5, 2024

> If we can be fooled, shouldn't we expect the same of our filters?

Sure, makes sense. Then again, if a new kind of filter wastes more resources to do a such a monumentally worse job that not only doesn’t it protect you but actively helps the bad actors trying to harm you, that is worth criticising and bringing to light.

II2II · on Aug 5, 2024

There is a world of difference between being failable and doing a monumentally worse job. While the article is playing up the incident, it is better to say that the author discovered the filter is failable. Sure, file a bug report. Sure, point out that we should be applying our own judgment when the machine tells us something (or anyone tells us something, for that matter). Yet I am not seeing any evidence here that this is a systematic problem. I also have doubts that it is a truly solvable problem. We can make the technology progressively better, but it will always be imperfect.

The problem should have been presented as a reminder to use our own brains. Nothing more and nothing less.

latexr · on Aug 5, 2024

> While the article is playing up the incident

Hard disagree. Saying “This seems… bad” is as mild as can be.

> Yet I am not seeing any evidence here that this is a systematic problem.

That was not the argument. How could this be systematic when the system isn’t even out for everyone?

> We can make the technology progressively better, but it will always be imperfect.

No one claimed it had to be perfect. But this is not better, or even equal, either.

> There is a world of difference between being failable and doing a monumentally worse job.

This didn’t simply “fail”, it actively pushed the user to something that would have been harmful to them. There is also a world of difference between “failed to detect message as phishing and treated as any other” and “pushed phishing message to the top of your inbox and marked it as priority”.

brookst · on Aug 5, 2024

> Hard disagree. Saying “This seems… bad” is as mild as can be.

I’m confused about that sentiment. The same developer beta has alarms that fail to go off (or go off at the wrong time). Among many other bugs.

Is your view that a developer beta must not have any flaws that would be catastrophic in a public release?

latexr · on Aug 5, 2024

> I’m confused about that sentiment.

I’m not sure I understand what you mean by this. All I mean is that I disagree that “playing up the incident” is an accurate description of the post.

> Is your view that a developer beta must not have any flaws that would be catastrophic in a public release?

It is not. Quite the contrary, betas serve the purpose of highlighting and fixing flaws.

https://news.ycombinator.com/item?id=41160141

Spivak · on Aug 5, 2024

I don't get the "wastes more resources" thing, it's just code running on your device and isn't a security product. GMail uses the same tricks for their "Important and Unread" section. I doubt Apple's little classifier even uses an LLM or whatever people are calling "AI."

Apple like everyone else is using the "AI as a marketing term" to push their existing, and generally very good, ML.

lolinder · on Aug 5, 2024

The problem is that in the absence of an LLM-powered "Priority" section this email would have ended up in the main mailbox with the rest of the emails with no special status, allowing human-level spam filters to kick in as normal and hopefully catch it in most cases. Instead, this "Priority" section now emphasizes that email as important and for a lot of people (though obviously not the author) will disable their natural suspicions.

This bug doesn't just return the user to the old status quo, it makes it more likely that they fall to a scam than they were before. This is a beta, but Apple Intelligence can't roll out like this—it has to have a spam filter of its own as a first pass, and there's no way the metadata in this email makes it past an LLM spam filter.

brookst · on Aug 5, 2024

I think it’s fair to say that the entire operating system cannot be released at the quality level in beta 1. CarPlay loses the ability to accept touch inputs, alarms someone go off an hour early, the entire screen fails to wake while tapping sometimes, and many many more fundamental problems.

Given the PR sensitivity around AI, Apple should never have included these features until they were much more polished, even in a beta, even if it meant waiting months.

talldayo · on Aug 5, 2024

This feels like the whole "Just wait for AGI" argument all over again, with a different audience. There is no promise or rule that suggests Apple will ever be able to fix this feature. By giving even a tiny bit of control to an AI, you're risking the chance that it statistically generates a token you didn't want. That's the random element that will rear it's ugly head at the least-convenient time. If Apple wants to avoid that (and rightfully so), then they shouldn't have tried building with AI in the first place.

brookst · on Aug 5, 2024

What does token generation have to do with a model that prioritizes email?

potatoman22 · on Aug 5, 2024

OP is assuming they're using a generative model to prioritize emails.

e.g. """classify the priority of this email: {{email}} Output one of the following priorities: LOW, MEDIUM, HIGH"""

Obviously there are other ways to rank emails, but I think their larger point about these models being essentially stochastic holds true.

cageface · on Aug 5, 2024

Sure but AI boosters alternately ignore these issues or dismiss them every time they come up. If it’s an extremely error prone tool with limited use cases let’s be honest and call it what it is.

“People also make mistakes” isn’t a good enough defense for a technology with this much hype and funding.

ryandrake · on Aug 5, 2024

Yea, people make math mistakes all the time, but I’d expect my calculator app to multiply correctly every time I used it. We should hold computers to a higher standard if we are going to rely on them.

brookst · on Aug 5, 2024

I think this is the heart of most of the angst around AI: it runs on computers, computers are precise and deterministic, therefore AI must be precise and deterministic.

But… it just doesn’t work that way. There is tons of room for improvement in safety and reliability, but expecting a multi-billion parameter neural network to have the same accuracy properties as a software calculator is always going to lead to frustration.

Complex systems have complex failure modes. There is a reason we use hammers and not CNC machine presses for nails.

cageface · on Aug 5, 2024

Right but this is way too often glossed over in the rush to hype the new models. I see even many people that should know better failing to treat their output with appropriate skepticism.

So much money is being dumped into this stuff now there's a huge incentive to sweep the shortcomings under the rug.

brookst · on Aug 5, 2024

Perhaps? But I'm not sure I see the value in saying that other people aren't doing a good job of setting expectations with yet other people. Presumably we around here know, right? And it's always fraught to imagine problems third hand.

cageface · on Aug 6, 2024

Presumably we around here know, right?

No that's the problem. Here and also among other smart, informed people I know I keep seeing people post unchecked LLM output.

II2II · on Aug 5, 2024

Yet the Windows 3.1 calculator couldn't subtract properly.

While I bring that example up in jest, there are real limitations to how computers do math. The calculator app may produce correct results for everyday problems. Yet there are many domains where you must know how floating point numbers are handled, how the computer handles trigonometric functions, etc.. It's not that the computer is wrong. There are simply limitations due to how floating point numbers are represented. Even integers can be problematic due to their own limitations.

ryandrake · on Aug 5, 2024

> Yet the Windows 3.1 calculator couldn't subtract properly.

OK, but I'm sure that Microsoft treated that as a bug to be fixed, rather than as an inherent limitation of computers that we just need to understand and deal with.

cageface · on Aug 5, 2024

This would be a useful analogy if Microsoft promoted the calculator as the next epoch-making trillion dollar revolution in computing and then swept aside the numerous mistakes it made on simple inputs as no worse than the average human.

sourcecodeplz · on Aug 5, 2024

Also remember all the user-account leaks. If you were part of the leak then it is trivial for bad actors to craft the perfect email, when they know what sites you have accounts on.