People are fooled by phishing emails all of the time. It is arrogant to suggest that anyone, including ourselves, are immune to falling prey to phishing. One of the reasons why so many phishing emails look like phishing emails is because the people creating them do not do due diligence when reaching out to their targets (e.g. ensuring that the email looks like it originates from a legitimate source). Another reason is that few of us are targeted directly (e.g. the phishers do not know whether we deal with the organization they are posing as). Yet the right combination of factors will leave anyone vulnerable.
If we can be fooled, shouldn't we expect the same of our filters? Sure, the filters may be better set up to identify certain forms of phishing and we may be in a better position to identify other types of phishing. Yet neither party is foolproof.
(Then there are things to consider like avoiding false positives, which will weaken the filters. It doesn't matter if those filters are automated or human.)
The “Humans on average are bad at X, therefore it isn’t a problem that AI makes mistakes.” argument is getting tired. No, I’ve never personally fallen for a phishing scam. If my email marked a phishing email as “Priority”, I would be much more likely to fall for it.
I wasn’t sure whether they intended that as exoneration or explanation but I think it should be the latter. The right way to think about LLMs is as a credulous hire with no job history: McDonald’s will hire a random 16 year old but only to work in a controlled environment with oversight, not to open the CEO’s mail and tell them what’s important.
I would shy away from any comparisons to a human. The right way to think about LLMs is like generative fill for text or reverse text summarization imho
I agree. I think as soon as you refer to AIs as what they are, computer programs, a lot of problems and solutions prevent themselves.
For example, why are some people trying to give rights to computer programs? Since when have computer programs had rights? Fair use doctrine, for example, is a right for human beings.
Fair, I’m really thinking about it in response to people pushing products with terminology we normally use for people but the comparison really is tricky since this is our first collective experience with something which can sound authoritative without any deeper understanding and historically many people have used one as the proxy for the other.
It is bad. Like it or not, the general public considers computers to be deterministic calculating machines that produce the correct result. And when this doesn't happen, we expect the software developer to treat the case as a defect and work to correct it. Now, we have people telling us, no, computers are not calculation tools, they are more like an overconfident 14-year-old Redditor, and any mistakes they make are not defects, but unavoidable limitations of AI and you should expect them.
My problem is that it conflicts with how it is being deployed and being trusted. People trust computer systems far beyond what trust they deserve, because they are use to some critical systems being made significantly resistant and most of the others as not having any significant problems. This logic is already a threat when it applied to standard applications built where a programmer, in theory, should have understood each part, at least while building it. This logic works much worse when applied to AI, yet AI are being sold using the common faith that people have in computer systems to give it more responsibility than it can rightly claim given its error rates and the faith people have.
I think the solution is to teach people to doubt expert systems, which will greatly harm their usefulness, but trust should be earned by these systems, on a system by system basis, and they don't deserve the level of trust they currently enjoy.
I don't think I made a false equivalency. What I'm trying to say is AI can't be perfect or always produce state of the art results. Bad outcomes can always occur, so we should remain vigilant, but not let perfection be the enemy of useful.
It's not so much that it is acceptable for filters to misclassify email as we should keep our expectations realistic.
Then again, I've never been one to treat the priority email folder with credulity. What it classifies as priority is often quite different from what I would. Never mind treating those emails as inherently legitimate.
> If we can be fooled, shouldn't we expect the same of our filters?
Sure, makes sense. Then again, if a new kind of filter wastes more resources to do a such a monumentally worse job that not only doesn’t it protect you but actively helps the bad actors trying to harm you, that is worth criticising and bringing to light.
There is a world of difference between being failable and doing a monumentally worse job. While the article is playing up the incident, it is better to say that the author discovered the filter is failable. Sure, file a bug report. Sure, point out that we should be applying our own judgment when the machine tells us something (or anyone tells us something, for that matter). Yet I am not seeing any evidence here that this is a systematic problem. I also have doubts that it is a truly solvable problem. We can make the technology progressively better, but it will always be imperfect.
The problem should have been presented as a reminder to use our own brains. Nothing more and nothing less.
Hard disagree. Saying “This seems… bad” is as mild as can be.
> Yet I am not seeing any evidence here that this is a systematic problem.
That was not the argument. How could this be systematic when the system isn’t even out for everyone?
> We can make the technology progressively better, but it will always be imperfect.
No one claimed it had to be perfect. But this is not better, or even equal, either.
> There is a world of difference between being failable and doing a monumentally worse job.
This didn’t simply “fail”, it actively pushed the user to something that would have been harmful to them. There is also a world of difference between “failed to detect message as phishing and treated as any other” and “pushed phishing message to the top of your inbox and marked it as priority”.
I don't get the "wastes more resources" thing, it's just code running on your device and isn't a security product. GMail uses the same tricks for their "Important and Unread" section. I doubt Apple's little classifier even uses an LLM or whatever people are calling "AI."
Apple like everyone else is using the "AI as a marketing term" to push their existing, and generally very good, ML.
The problem is that in the absence of an LLM-powered "Priority" section this email would have ended up in the main mailbox with the rest of the emails with no special status, allowing human-level spam filters to kick in as normal and hopefully catch it in most cases. Instead, this "Priority" section now emphasizes that email as important and for a lot of people (though obviously not the author) will disable their natural suspicions.
This bug doesn't just return the user to the old status quo, it makes it more likely that they fall to a scam than they were before. This is a beta, but Apple Intelligence can't roll out like this—it has to have a spam filter of its own as a first pass, and there's no way the metadata in this email makes it past an LLM spam filter.
I think it’s fair to say that the entire operating system cannot be released at the quality level in beta 1. CarPlay loses the ability to accept touch inputs, alarms someone go off an hour early, the entire screen fails to wake while tapping sometimes, and many many more fundamental problems.
Given the PR sensitivity around AI, Apple should never have included these features until they were much more polished, even in a beta, even if it meant waiting months.
This feels like the whole "Just wait for AGI" argument all over again, with a different audience. There is no promise or rule that suggests Apple will ever be able to fix this feature. By giving even a tiny bit of control to an AI, you're risking the chance that it statistically generates a token you didn't want. That's the random element that will rear it's ugly head at the least-convenient time. If Apple wants to avoid that (and rightfully so), then they shouldn't have tried building with AI in the first place.
Sure but AI boosters alternately ignore these issues or dismiss them every time they come up. If it’s an extremely error prone tool with limited use cases let’s be honest and call it what it is.
“People also make mistakes” isn’t a good enough defense for a technology with this much hype and funding.
Yea, people make math mistakes all the time, but I’d expect my calculator app to multiply correctly every time I used it. We should hold computers to a higher standard if we are going to rely on them.
I think this is the heart of most of the angst around AI: it runs on computers, computers are precise and deterministic, therefore AI must be precise and deterministic.
But… it just doesn’t work that way. There is tons of room for improvement in safety and reliability, but expecting a multi-billion parameter neural network to have the same accuracy properties as a software calculator is always going to lead to frustration.
Complex systems have complex failure modes. There is a reason we use hammers and not CNC machine presses for nails.
Right but this is way too often glossed over in the rush to hype the new models. I see even many people that should know better failing to treat their output with appropriate skepticism.
So much money is being dumped into this stuff now there's a huge incentive to sweep the shortcomings under the rug.
Perhaps? But I'm not sure I see the value in saying that other people aren't doing a good job of setting expectations with yet other people. Presumably we around here know, right? And it's always fraught to imagine problems third hand.
Yet the Windows 3.1 calculator couldn't subtract properly.
While I bring that example up in jest, there are real limitations to how computers do math. The calculator app may produce correct results for everyday problems. Yet there are many domains where you must know how floating point numbers are handled, how the computer handles trigonometric functions, etc.. It's not that the computer is wrong. There are simply limitations due to how floating point numbers are represented. Even integers can be problematic due to their own limitations.
> Yet the Windows 3.1 calculator couldn't subtract properly.
OK, but I'm sure that Microsoft treated that as a bug to be fixed, rather than as an inherent limitation of computers that we just need to understand and deal with.
This would be a useful analogy if Microsoft promoted the calculator as the next epoch-making trillion dollar revolution in computing and then swept aside the numerous mistakes it made on simple inputs as no worse than the average human.
Also remember all the user-account leaks. If you were part of the leak then it is trivial for bad actors to craft the perfect email, when they know what sites you have accounts on.
People are fooled by phishing emails all of the time. It is arrogant to suggest that anyone, including ourselves, are immune to falling prey to phishing. One of the reasons why so many phishing emails look like phishing emails is because the people creating them do not do due diligence when reaching out to their targets (e.g. ensuring that the email looks like it originates from a legitimate source). Another reason is that few of us are targeted directly (e.g. the phishers do not know whether we deal with the organization they are posing as). Yet the right combination of factors will leave anyone vulnerable.
If we can be fooled, shouldn't we expect the same of our filters? Sure, the filters may be better set up to identify certain forms of phishing and we may be in a better position to identify other types of phishing. Yet neither party is foolproof.
(Then there are things to consider like avoiding false positives, which will weaken the filters. It doesn't matter if those filters are automated or human.)