Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The argument about LLMs is wrong, not because of reasons stated but because semantic meaning shouldn't solely be defined by the publisher.

The real question is whether the average publisher is better than an LLM at accurately classifying their content. My guess is, when it comes to categorization and summarization, an LLM is going to handily win. An easy test is: are publishers experts on topics they talk about? The truth of the internet is no, they're not usually.

The entire world of SEO hacks, blogspam, etc exists because publishers were the only source of truth that the search engine used to determine meaning and quality, which has created all the sorts of misaligned incentives that we've lived with for the past 25 years. At best there are some things publishers can provide as guidance for an LLM, social card, etc, but it can't be the only truth of the content.

Perhaps we will only really reach the promise of 'the semantic web' when we've adequately overcome the principal-agent problem of who gets to define the meaning of things on the web. My sense is that requires classifiers that are controlled by users.



Yet LLMS fail to make these simple but sometimes meaningful differentiation. See for example this case in which a court reporter is described as being all the things he reported about by Copilot: a child molester, a psychatric escapee, a widow cheat. Presumably because his name was in a lot of articles about said things and LLMS simply associate his name with the crimes without making the connection that he could in fact be simply the messenger and not the criminal. If LLMS had the semantic understanding that the name on top/bottom of a news article is the author, it would not have made that mistake.

https://www.heise.de/en/news/Copilot-turns-a-court-reporter-...


Absolutely! Today's LLMs can sometimes(/often?) enormously suck and should not be relied upon for critical information. There's a long way to go to make them better, and I'm happy that a lot of people are working on that. Finding meaning in a sea of information is a highly imperfect enterprise regardless of the tech we use.

My point though was that the core problem we should be trying to solve is overcoming the fundamental misalignment of incentives between publisher and reader, not whether we can put a better schema together that we hope people adopt intelligently & non-adversarially, because we know that won't happen in practice. I liked what the author wrote but they also didn't really consider this perspective and as such I think they haven't hit upon a fundamental understanding of the problem.


Humans do something very similar, fwiw. It's called spontaneous trait association: https://www.sciencedirect.com/science/article/abs/pii/S00221...


> fwiw

What do you think this sort of observation is worth?


Really depends on what sort of person you are I guess.

Some people appreciate being shown fascinating aspects of human nature. Some people don't, and I wonder why they're on a forum dedicated to curiosity and discussion. And then, some people get weirdly aggressive if they're shown something that doesn't quite fit in their worldview. This topic in particular seems to draw those out, and it's fascinating to me.

Myself, I thought it was great to learn about spontaneous trait association, because it explains so much weird human behavior. The fact that LLMs do something so similar is, at the very least, an interesting parallel.


>My guess is, when it comes to categorization and summarization, an LLM is going to handily win. An easy test is: are publishers experts on topics they talk about? The truth of the internet is no, they're not usually.

LLMs are not experts either. Furthermore, from what I gather, LLMs are trained on:

>The entire world of SEO hacks, blogspam, etc


This is an excellent rebuttal. I think it is an issue that can be overcome but I appreciate the irony of what you point out :)


> because semantic meaning shouldn't solely be defined by the publisher

LLMs are not that great at understanding semantics though




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: