Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just to be clear I'm understanding correctly:

This is pulling the content of the RSS feeds of several news sites into the context window of an LLM and then asking it to summarize news items into articles and fill in the blanks?

I'm asking because that is what it looks like, but AI / LLMs are not specifically mentioned in this blog post, they just say news are 'generated' under the 'News in your language' heading, which seems to imply that is what they are doing.

I'm a little skeptical towards the approach, when you ask an LLM to point to 'sources' for the information it outputs, as far as I know there is no guarantee that those are correct – and it does seem like sometimes they just use pure LLM output, as no sources are cited, or it's quoted as 'common knowledge'.



Just for concrete confirmation that LLM(s) are being used, there's an open issue on the GitHub repository, on hallucinations with made up information, where a Kagi employee specifically mentions "an LLM hallucination problem":

https://github.com/kagisearch/kite-public/issues/97

There's also a line at the bottom of the about page at https://kite.kagi.com/about that says "Summaries may contain errors. Please verify important information."


Love how it only took 8 years to go from "Fake News!" to "News May Be Fake"


FWIW, as someone who has chosen to pay for Kagi for three years now:

- I agreee fake news is a real problem

- I pay for Kagi because I get more much more precise results[1]

- They have a public feedback forum and I think every time I have pointed out a problem they have come back with an answer and most of the time also a fix

- When Kagi introduced AI summaries in search they made it opt in, and unlike every other AI summary provider I had seen at that point they have always pointed to the sources. The AI might still hallucinate[2] but if it does I am confident that if I pointed it out to them my bug report would be looked into and I would get a good answer and probably even a fix.

[1]: I hear others say they get more precise Google results, and if so, more power to them. I have used Google enthusiastically since 2005, as the only real option from 2012, as fallback for DDG since somewhere between 2012 and 2022 and basically only when I am on other peoples devices or to prove a point since I started using Kagi in 2022

[2]: haven't seen much of that, but that might be because of the kind of questions I ask and the fact that I mostly use ordinary search.


To late to edit, but I probably started using Google somewhere between February 2001 and July 2003, not in 2005.


There's too much demand for fake news, plenty of subsidy for it, and it's far easier to make.

Non fake news is going to be restricted to pay services like Bloomberg terminals.


It is getting easier and easier to fake stuff and there are becoming less and less fully trusted institutions. So sadly I think you are right. Its scary but we are likely heading towards a future where you need to pay to get verified information and that itself will likely be segmented to different subscriptions for what information you want.


> Its scary but we are likely heading towards a future where you need to pay to get verified information

…are you describing a newspaper?


> It is getting easier and easier to fake stuff

This is why the moon landing hoax was revolutionary in the 60's. The size of this project was enormous.


As an American, confirmation that the landing was a hoax would make me even prouder than my current belief that it was real.


The moon landing was filmed. Problem is, Stanley Kubrick was such a perfectionist that he _demanded_ they film on location.


On the other hand, he shot Full Metal Jacket in England and not Vietnam, so maybe he was able to compromise ..


Yeah, they filmed it on Ganymede instead.


Haha, perhaps.


Well the thing is that technically information is free, but creating it is definitely not. So, if Ads are not paying for it, and people won't pay for it either, who does?

Fake news exists because of the perverses incentives of the system; where getting as many clicks as possible is what matters. This is very much a result of social networks and view-based remuneration.

I don't think it's that bad if people need to pay for real information...


Fake news mainly exist, because the need to disguise, manipulate and lie for power gains is probably a old as humanity.


It seems a challenging situation.. if you pay fact checkers you get accused of censorship by “weaponised free speech” and if you leave it to the community you get inconsistent results.


The first one sounds like it's an argument made by someone who never wanted the facts to begin with. Correcting misinformation is not stifling free speech.

I'm all for more proper fact checkers, backed by reputable sources.


Yes, this is the case. Fact checkers were kept alive by efforts of the big tech firms. Post Trump’s election, funding has been pulled globally.


And paid newspapers, hopefully.


At least we're going from Fake News from certain MAGA leaning sources at 75-90% fake to 99% actual news and 1% hallucinations?


Man am I tired of this stuff.


My LLM Investor Agent says we must keep investing in AI, tulips will always be worth more at a later date


To take a moment to be a hopeless Stan for one of my all-time favorite companies: I don't think the summary above yours is fair, and I see why they don't center the summary part of it.

Unlike the disastrous Apple feature from earlier this year (which is still available, somehow!), this isn't trying to transform individual articles. Rather, it's focused on capturing broader trends and giving just enough info to decide whether to click into any of the source articles. That seems like a much smaller, more achievable scope than Apple's feature, and as always, open-source helps work like this a ton.

I, for one, like it! I'll try it out. Seems better than my current sources for a quick list of daily links, that's for sure (namely Reddit News, Apple News, Bluesky in general, and a few industry newsletters).


>giving just enough info to decide whether to click into any of the source articles.

If that info is hallucinated, then it's worse than useless. Click bait still attempts to represent the article, a hallucination isn't guaranteed to do thst.

Why not have someone properly vet out interesting and curious news and articles and provide traffic to their site? In this age of sincerity, proper citation is more vital than ever.


Yeah. I really like Kagi. This is a terrible idea.

1. It seems to omit key facts from most stories.

2. No economic value is returned to the sources doing the original reporting. This is not okay.

3. If your summary device makes a mistake, and it will, you are absolutely on the hook for libel.

There seem to be some misunderstandings about what news is and what’s makes it well-executed. It’s not the average, it’s the deepest and most accurate reporting. If anyone from the Kagi team wants to discuss, I’m a paying member and I know this field really, really well.


Thank you. Also a paying Kagi user because I like the idea that it’s worth it to pay for a good service. Ripping off journalists/newspapers content goes against that.


> It’s not the average, it’s the deepest and most accurate reporting.

Yes! I'm also a paying member but I'm deeply suspicious of this feature.

The website claims "we expose readers to the full spectrum of global perspectives", but not all perspectives are equal. It smacks of "all sides" framing which is just not what news ought to be about.


Yes, that's what it is. Kagi as a brand is LLM-optimist, so you may be fundamentally at odds with them here... If it lessens the issue for you, the sources of each item are cited properly in every example I tried, so maybe you could treat it as a fancy link aggregator


> Kagi as a brand is LLM-optimist

Kagi founder here. I am personally not an LLM-optimist. The thing is that I do not think LLMs will bring us to "Star Trek" level of useful computers (which I see humans eventually getting to) due to LLM's fundamentally broken auto-regressive nature. A different approach will be needed. Slight nuance but an important one.

Kagi as a brand is building tools in service of its users, no particular affinity towards any technologies.


You claimed reading LLM summaries will provide complete understanding. Optimistic would be a charitable description of this claim. And optimism is not limited to the most optimistic.


Another LLM-pragmatist here. I don't see why we should treat LLMs differently than any other tool in the box. Except maybe that it's currently the newest and most shiny, albeit still a bit clunky and overpriced.


Fwiw, I love your approach to AI. It's been very useful to me. Quick answers especially has been amazingly accurate and I've used it hundreds of times, if not thousands, and routinely check the links it gives


Happy Kagi Ultimate user here, so thank you!


I'm about as AI-pessimist as it gets, but Kagi's use of LLMs is the most tasteful and practical I've seen. It's always completely opt-in (e.g. "append a ? to your search query if you want an AI summary", as opposed to Google's "append a swear word to your search query if you don't want one"), it's not pushy, and it's focused on summarizing and aggregating content rather than trying to make it up.


FYI, you can append &udm=14 to Google searches to remove AI results and a bunch of the other clutter they've added.


I did that, and started getting flagged as a bot. Had to search elsewhere(Kagi) full time, or else suffer endless "find a bike" nonsense.

I think Google hates the loss of no/few ads or lame suggestions.


Google thinks the same of me and I don't even edit the URL. I can have a session working just fine one night and come back the next day, open a new tab to search for something, and get captcha'd to hell. I'm fairly sure they just mess with Firefox on purpose. I won't install Brave, Chrome, or Edge out of principle either. Safari works fine, but I don't like it.


Google will captcha me on the second or third search if I try to use the "site":" advanced keyword to narrow down search

I'm sorry I know how to use your tool?? ? Didn't you put these keywords in to be used?


Google has gotten amazingly hostile toward power users. I don't even try to use it anymore. It almost feels like they actively hate people that learned how to use their tools


Neat trick, any other params folks might want to know about?


I found this page that describes a variety of search parameter: https://susodigital.com/thoughts/the-mystery-of-the-google-u...

then i got the machine to write a front-end that visualises them and builds a search query for you: https://pastebin.com/HNwytYr9

enjoy


I consider myself a major LLM optimist in many ways, but if I'm receiving a once per day curated news aggregation feed I feel I'd want a human eye. I guess an LLM in theory might have less of the biases found in humans, but you're trading one kind of bias for another.


Indeed! A once per day human-curated news aggregation feed used to be called a "newspaper". You can still get them in some places, I believe.


This isn't really comparable. A newspaper is a single source. New York Times is a newspaper, CNN (a part of it) is a newspaper. Services like Kagi News, whether AI or human-curated, try to do aggregation and meta-analysis of many newspaper.


Newspapers routinely report what other newspapers said. The original(s) is the one "breaking" the story, the others are "covering" the story.


Yeah, I agree. The entire value/fact dichotomy that the announcement bases itself on is a pretty hot philosophical topic I lean against Kagi on. It's just impossible to summarize any text without imparting some sort of value judgement on it, therefore "biasing" the text


> It's just impossible to summarize any text without imparting some sort of value judgement on it, therefore "biasing" the text

Unfortunately, the above is nearly a cliché at this point. The phrase "value judgment" is insufficient because it occludes some important differences. To name just two that matter; there is a key difference between (1) a moral value judgment; (2) selection & summarization (often intended to improve information density for the intended audience).

For instance, imagine two non-partisan medical newsletters. Even if they have the same moral values (e.g. rooted in the Hippocratic Oath), they might have different assessments of what is more relevant for their audience. One could say both are "biased", but does doing so impart any functional information? I would rather say something like "Newsletter A is compromised of Editorial Board X with such-and-such a track record and is known for careful, long-form articles" or "Newsletter B is a one-person operation known for a prolific stream of hourly coverage." In this example, saying the newsletters differ in framing and intended audience is useful, but calling each "biased in different ways" is a throwaway comment (having low informational content in the Shannonian sense).

Personally, instead of saying "biased" I tend to ask questions like: (a) Who is their intended audience; (b) What attributes and qualities consistently shine through?; (c) How do they make money? (d) Is the publication/source transparent about their approach? (e) What is their track record about accuracy, separating commentary from factual claims, professional integrity, disclosure of conflicts of interest, level of intellectual honesty, epistemic standards, and corrections?


> The entire value/fact dichotomy that the announcement bases itself on

Hmmm. Here I will quote some representative sections from the announcement [1]:

>> News is broken. We all know it, but we’ve somehow accepted it as inevitable. The endless notifications. The clickbait headlines designed to trigger rather than inform, driven by relentless ad monetization. The exhausting cycle of checking multiple apps throughout the day, only to feel more anxious and less informed than when we started. This isn’t what news was supposed to be. We can do better, and create what news should have been all along: pure, essential information that respects your intelligence and time.

>> .. Kagi News operates on a simple principle: understanding the world requires hearing from the world. Every day, our system reads thousands of community curated RSS feeds from publications across different viewpoints and perspectives. We then distill this massive information into one comprehensive daily briefing, while clearly citing sources.

>> .. We strive for diversity and transparency of resources and welcome your contributions to widen perspectives. This multi-source approach helps reveal the full picture beyond any single viewpoint.

>> .. If you’re tired of news that makes you feel worse about the world while teaching you less about it, we invite you to try a different approach with Kagi News, so download it today ...

I don't see any evidence from these selections (nor the announcement as a whole) that their approach states, assumes, or requires a value/fact dichotomy. Additionally, I read various example articles to look for evidence that their information architecture group information along such a dichotomy.

Lastly, to be transparent, I'll state a claim that I find to be true: for many/most statements, it isn't that difficult nor contentious to separate out factual claims from value claims. We don't need to debate the exact percentages or get into the weeds on this unless you think it will be useful.

I will grant this -- which is a different point that what the commenter above made -- when reading various articles from a particular source, it can take effort and analysis to suss out the source's level of intellectual honesty, ulterior motives, and other questions I mention in my sibling comment.

[1]: https://blog.kagi.com/kagi-news


Don't worry, all those news articles are of course human curated.

(I say this sarcastically and unhappily)


Hard pass then. I’m a happy Kagi search subscriber, but I certainly don’t want more AI slop in my life.

I use RSS with newsboat and I get mainstream news by visiting individual sites (nytimes.com, etc.) and using the Newshound aggregator. Also, of course, HN with https://hn-ai.org/


> Also, of course, HN with https://hn-ai.org/

Ironically, this submission is at the top of that website :)


You can also convert regular newspapers into RSS feeds! NYTimes and Seattle Times have official RSS feeds, and with some scripting you can also get their article contents.


> when you ask an LLM to point to 'sources' for the information it outputs, as far as I know there is no guarantee that those are correct

A lot of times when I ask for a source, I get broken links. I'm not sure if the links existed at one point, or if the LLM is just hallucinating where it thinks a link should exist. CDN libraries, for example. Or sources to specific laws.


I monitor 404 errors on my website. ChatGPT frequently sends traffic to pages that never existed. Sometimes the information they refer to has never existed on my website.

For example: "/glossary/love-parade" - There is no mention of this on my website. "/guides/blue-card-germany" has always been at "/guides/blue-card". I don't know what "/guides/cost-of-beer-distribution" even refers to.


Definitely need an LLM to just generate it automatically on the fly! Welcome to the future! (Just kidding please don't (generate automatically))


Not quite this, but still relevant: https://www.ty-penguin.org.uk/~auj/spigot/


A great idea if you're looking to intentionally sabotage AI.


> A lot of times when I ask for a source,

They'll do pretty much everything you ask of them, so unless the text actually come from some source (via tool calls, injecting content into the context or other way), they'll make up a source rather than doing nothing, unless prompted otherwise.


On my llm, I have a prompt that condenses down to:

For every line of text output, give me a full MLA annotated source. If you cannot then say your source does not exist or you are generating information based on multiple sources then give me those sources. If you cannot do that, print that you need more information to respond properly.

Every new model I mess with needs a slightly different prompt due to safeguards or source protections. It is interesting when it lists a source that I physically own and their training data is deteriorated.


They could make up source, but ChatGPT is an actual app with complicated backend, not dumb pipe between textedit and GPU. Surely they could verify on server side every link they output to user before including it in the answer. I'm sure Codex will implement it in no time!


They surely can detect it, but what are they going to do after detecting it? Loop the last job with a different seed and hope that the model doesn't lie through its teeth? They won't be doing it because the model will gladly generate you a fake source on the next retry too.


This is actually harder then most think. The chances of your app doing this check being bot detected/blocked is very high.

(unless you are Google etc which are specifically let in to get the article indexed into search)


Maybe they should be trained on the understanding that making up a source is not "doing what you ask of them" when you ask for a source. It's actually the exact opposite of the "doing what you asked, not what you wanted" trope-- it's providing something it thinks you want instead of providing what you asked for (or being honest/erroring out that it can't).


Think for a second about what that means... this is a very easy thing to do IFF we already had a general purpose intelligence.

How do you make an LLM understand that it must only give factual sources? Just some naive RL with positive reward on the correct sources and negative reward on incorrect sources is not enough -- there are obscenely many more hallucinated sources possible, and the set of correct sources is a set of insanely tiny measure.


"Easy". You make the model distinguish between information and references to information. Information may be fabricated (for example, a fictional book is mostly composed of lies) but references are assumed to be factual (a link does point to something and is related to something). Factual information is true only to the degree that it is conveyed exactly, so the model needs to be able to store and reproduce references verbatim.

Of course, "easy" is in quotes because none of this is easy. It's just easier than AGI.


Wrong, just ask it about some non existent famous historical person and it will most likely tell you it didnt exist.


If you need to ask for a source in the first place, chances are very high that the LLM's response is not based on summarizing existing sources but rather exclusively quoting from memory. That usually goes poorly, in my experience.

The loop "create a research plan, load a few promising search results into context, summarize them with the original question in mind" is vastly superior to "freely associate tokens based on the user's question, and only think about sources once they dig deeper".


I'm genuinely asking, but have you tried it? https://kite.kagi.com

It actually seems more like an aggregator (like ground.news) to me. And pretty much every single sentence cites the original article(s).

There are nice summaries within an article. I think what they mean is that they generate a meta-article after combining the rest of them. There's nothing novel here.

But the presentation of the meta-article and publishing once a day feel like great features.


I have yeah, to me it looks like what I described in my comment above, it's LLM generated text, is it not?

> And pretty much every single sentence cites the original article(s).

Yeah but again, correct me if I'm wrong, but I don't think asking an LLM to provide a source / citation yields any guarantee that the text it generates alongside it is accurate.

I also see a lot of text without any citations at all, here are three sections (Historical background, Technical details and Scientific significance) that don't cite any sources: https://kite.kagi.com/s/5e6qq2


Oof, that one is particularly bad because it cites 3 sources which are all the same article.

Google points to phys and phys is a republish of the MIT article.


There's even a 'perspectives' section that tries to contrast two of the sources, but they're the same article.


I can envision the day where an LLM article generator starts consuming LLM generated articles which were sourced from single articles (co-written by an LLM).


They are LLM generated summaries, sure.

I guess I'm trying to understand your comment. Is there a distinction you're making between LLM summaries or LLM generated text, or are you stating that they aren't being transparent about the summaries being generated by LLMs (as opposed to what? human editors?).

Because at some point when I launched the app, it did say summaries might be inaccurate.

Looks like you found an example where it isn't properly citing the summaries. My guess is that they will tighten this up, because I looked mostly at the first and second page and most of those articles seemed to have citations in the summaries.

Like most people, I would want those everywhere to guard against potential hallucinations. No, the citations don't guarantee that there weren't any hallucinations, but if you read something that makes you go "huh" – the citations give you a low-friction opportunity to read more.

But another sibling commenter talked about the phys.org and google both pointing to the same thing. I agree, and this is exactly an issue I have with other aggregators like Ground.news.

They need to build some sort of graph that distills down duplicates. Like I don't need the article to say "30 sources" when 26 of them are just reprints of an AP/Reuters wire story. That shouldn't count as 30 sources.


> I guess I'm trying to understand your comment. Is there a distinction you're making between LLM summaries or LLM generated text, or are you stating that they aren't being transparent about the summaries being generated by LLMs (as opposed to what? human editors?).

The main point of my original comment was that I wanted to understand what this is, how it works and whether I can trust the information on there, because it wasn't completely clear to me.

I'm not super up to date with AI stuff, but my working knowledge is that I should never trust the output of an LLM and always verify it myself, so therefore I was wondering if this is just LLM output or if there is some human review process, or a mechanism related to the citation functions that makes it output of a different, more trusted category.

I did catch the message on the loading screen as well now, I do still think it could be a little more clear on the individual articles about it being LLM generated text, apart from that I think I understand somewhat better what it is now.


> No, the citations don't guarantee that there weren't any hallucinations, but if you read something that makes you go "huh" – the citations give you a low-friction opportunity to read more.

Either you mean every time you read something interesting (“huh”) you should check it. But in that case, why bother with reading the AI summary in the first place…

Or you mean that any time you read something that sounds wrong, you should check it. But in that case, everything false in the summaries that happens to sound true to you will be confirmed in your mind without you ever checking it.


> as opposed to what? human editors?

...yes? If I go to a website called "_ News" (present company included), I expect to see either news stories aggregated by humans or news stories written and fact checked by humans. That's why newspapers have fact checking departments, but they're being replaced by something with almost none of the utility and its proponents are framing the benefits of the old system as impossible or impractical.


I think you misunderstood my comment. I wasn't challenging the concept of human editors and fact checkers. I was asking a parent for a clarification of what the parent post meant by outlining that they were LLM generated summaries.

Like, I was asking whether they were expecting the curation/summarization to be done by humans at Kagi News.


Publishing once a day to remove the "slot machine dopamine hit" is worth it for that alone. I have forever been looking for a peer/replacement to Google News, I was about to pony up for a Ground News subscription but I'll probably hold off for a couple more months. Alternatives to google news have been sorely lacking for over a decade, especially since google news got their mobile-first redesign which significantly and permanently weakened the product to meet some product manager's bonus-linked KPI. One more product to wean off the google mothership. Gmail is gonna be real hard though.


> Gmail is gonna be real hard though.

Gmail seems like the easiest piece of the Google puzzle to replace. Different calendar systems have different quirks around repeating events, you sometimes need to try a variety of search engines to find what you're looking for, Docs aren't bug-for-bug equivalent to the Office or iCloud competitors, YouTube has audience, monetization, and hosting scale... Gmail is just "make an email account with a different provider and switch all of your accounts to use the new address." They don't even give you that much storage for free Gmail; it's 15GB, which lots of other email providers can match (especially paid ones). You can import your old emails to your new provider or just store them offline with a variety of email clients.

Is updating all of your accounts (and telling your contacts about the new address) what you consider to be the hard part, or do you actually use any Gmail-specific features? Genuinely curious, as I tend to disregard almost all mail-provider-specific features that any of my mail providers try to get me excited about (Gmail occasionally adds some new trick, but Zoho Mail is especially bad about making me roll my eyes with their new feature notifications).


I am sticking with this reprehensible company for email because their spam detection is awesome and I have found no clear measurements of detection to reasonably compare. I’d love to be proven wrong!


Switched from Gmail to Fastmail about 10 years ago.

2-3 spam emails slip through every week, and sometimes a false positive happens when I sign up for something new. I don't see this as a huge problem, and I doubt Gmail is significantly better.


Gmail significantly improved the email spam situation for everyone by aggressively pushing email security standards like DMARC/DKIM/SPF [1]. This came at the cost of basically no longer being able to selfhost your own email server though.

I agree with the other commenter, I use Fastmail and I get very few spam emails, most of which wouldn't have been detected by gmail either because they're basically legitimate looking emails advertising scams. I have a Gmail account I don't use and it seems like it receives about the same amount of spam, if not more.

1: https://www.cloudflare.com/en-gb/learning/email-security/dma...


I don't understand how this once-per-day thing, very obviously a cost-cutting measure, can be taken seriously as a "feature". Stories evolve throughout the day. If this is truly important to you, just screen shot Google News, then look at the screen shot all day.


I am fine with it using AI but it makes me feel pretty icky that they didn’t mention that this was ai/llm generated at any point in this article. That’s a no-no IMO, and has turned me off this pretty strongly.


Why do you care what technology was used to generate the summaries? What if they had used their old NLP summarizer?


They don't explicitly say they generate summaries at any point in the article. In fact I read it and though this was just some fancy RSS aggregator. The way they describe the "daily briefing" is extremely ambiguous.


OK, but I'd like to repeat my question here: Why do you care how the summary was generated?


I'm not the person you asked, but it's useful to know if the summary was generated using a method prone to inaccuracy.


That's all methods, though. Have you seen humans?


In this situation, humans are more accurate, for now, so it's good information to have.

Same as I would like to know if humans self assessed in a study about how well they drive vs the empirical evidence. Humans just aren't that good at that task so it would be good to know coming in.

Just call it Kagi Vibes instead of Kagi News as news has a higher bar (at least for me)


I'm not sure I agree that humans are more accurate at summarizing, but I don't have data, so I'll take your word for it.


I'd point to Wikipedia. You can say the content is "wrong". But the links go to the right place.


In my experience with Claude research, the links ~always go to the right place.


different kinds of inaccuracy


I find it more distasteful that they weren’t transparent about their method than the method being AI.


But they have updated the text now, which is nice!


Sure, but it's useful to know what kind of inaccuracies to look for.


Someone needs to coin the fallacy that, when anyone criticises LLMs, the speaker retorts with "but how humans are any better?"

I've seen it so many times it definitely needs a name. As an entity of human intelligence, I am offended by these silly thought-terminating arguments.


It's called "the perfect world fallacy".


At least I want to know that it’s a summary and not the actual content of any article.


Many reasons, if this news were summarised by humans I would prefer it, one example reason is with AI summaries I know to look out for hallucinations.

To be honest though that’s not the point. I’m more annoyed they weren’t transparent about their methods than I am about them using AI.


Kagi team member here who helped author that blog post announcement. That's a fair point, and we've updated the post to now clearly mention that we use AI for the daily briefing. Thank you for your feedback!


Thank you!


I'm firmly on the side of "AI" skepticism, but even I have to admit that this is a very good use of the tech. LLMs generally do a great job at summarizing text, which is essentially what this is. The sources could be statically defined in advance, given that they know where they pull the information from, so I don't think the LLM generates that content.

So if this automates the process of fetching the top news from a static list of news sites and summarizing the content in a specific structure, there's not much that can go wrong there. There's a very small chance that the LLM would hallucinate when asked to summarize a relatively short amount of text.


It's useful for the users, but tragically bad for anyone involved with journalism. Not that they're not used to getting fucked by search engines at this point, be it via AMP, instant answers, or AI overviews.

Not that the userbase of 50k is big enough to matter right now, but still...


What journalism? Most of these sites copy their content from each other or social media, and give it their own spin. Nowadays most of them use AI anyway.

Actual journalism doesn't rely on advertising, and is subscription based. Anyone interested in that is already subscribed to those sources, but that is not the audience this service is aiming for. Some people only want to spend a few minutes a day catching up with major events, and this service can do that for them. They're not the same people who would spend hours on news sites, so these sites are not missing any traffic.


Broadly agreed, I don't consider the CBS (national) news website to be a source of hard hitting journalism; Reuters, however, is. Reuters and the AP are often the source of these news stations.

I continue to subscribe to Reuters because of the quality of journalism and reporting. I have also started using Kagi News. They are not incompatible.


All this is doing is aggregating RSS feeds and linking to the original articles.

So this might result in lower traffic for "anyone involved in journalism" – but the constant doomscrolling is worse for society. So I think we can all agree that the industry needs to veer towards less quantity and more quality.


RSS feeds are meant to be used by actual users, not regurgitated publicly. RSS readers at the very least have have author info visible and its users tend to be reported to website's analytics with a special user agent.


I see! One thing I'm wondering: They say they are fetching the content from the RSS feeds of news outlets rather than scraping them, I haven't used RSS in a bit, but I recall most news outlets would usually not include the full article in their feed but just the headline or a small summary. I'd be worried that articles with misleading headlines (which are not uncommon) might cause this tool to generate incorrect news items, is that not a concern?


That's a fair concern, and I would prefer it if they scraped the sites instead. They could balance this out by favoring content from sites that do provide the entire article in their feeds, but that could lead to bias problems. Maybe this is why their own summaries are short. We can't know for sure unless they explain how it works.


We used to do this with a human created meta tag but I guess this is better?


If the parent commenter is correct, the concern I'd have would be about transparency. Even if it's good at what it does, I don't think we're anywhere close to a place as a society where it shouldn't be explicit when it's being used for something like this.


This also is a really ignorant approach to data poisoning issues in the LLM space. LLMs can easily be misused as propaganda machines...


It's not binary - it's a continuum.

When you go to Google News, the way they group together stories is AI (pre-LLM technology). Kagi is merely taking it one step further.

I agree with your concern. I see this as a convenient grouping, and if any interests me I can skip reading the LLM summary and just click on the sources they provide (making it similar to Google News).


> Kagi is merely taking it one step further.

I would argue creating your own summary is several steps beyond an ordering algorithm.


It cannot be "one step further", because there's a clear break in reality between what Google News provides and Kagi provides. Google News links to an article that exists in our world, 100%, no chance involved. Kagi uses an LLM generate text and thus is entirely up to chance.


Devil's advocate here.

Do you know that's what they're doing? They are a search engine after all. They do run their own indexer, as well as cache results from other sources.

If they're feeding urls to an AI, why can't they validate AI output urls are real? Maybe they do.


I don't care.


> When you go to Google News

You don't and you should not use this one either.


It also publishes normal unabridged rss feed that you can read with rss reader of your choice, it looks like great news source


I just don’t understand what this brings into the picture. Presumably your newspaper of choice already has

A) redacted the news in a format that is read friendly

B) set up a page with prioritized news

Because _that’s what a newspaper is_.

What extra value is gotten from a AI rewrite? At best is a borderline noop, at worst a lossy transformation (?)


> when you ask an LLM to point to 'sources' for the information it outputs,

Services listing sources, like Kagi news, perplexity and others don't do that. They start with known links and run LLMs on that content. They don't ask LLMs to come up with links based on the question.


That is what I mean yeah, I’m not saying it’s fabricating sources from training data, that would obviously be impossible for news articles, I’m saying if you give it a list of articles A, B and C including their content in the context and ask ‘what is the foo of bar?’ and it responds ‘the foo of bar is baz, source: article B paragraph 2’, that does not tell you whether the output is actually correct, or contained in the cited source at all, unless you manually verify it.


This seems like the opposite of "privacy by design"

> Privacy by design: Your reading habits belong to you. We don’t track, profile, or monetize your attention. You remain the customer and not the product.

But the person running the LLM surely does.


How would the LLM provider get any information about your reading habits from the app? The LLM is used _before_ the news content is served to you, the reader.


The line “news stories will be generated” throws up red flags across the horizon for me.

That’s not news. That’s news-adjacent random slop.


It's also a workaround around copyright, news sites would be (rightfully) pissed if you publicly post their articles in full and would argue that you're stealing their viewership. But, if you're essentially doing an automatic mash-up of five stories on the same topic from different sources, all of a sudden you're not doing anything wrong!

As an example from one of their sources, you can only re-publish a certain amount of words from an article in The Guardian (100 commercially, 500 non-comercially) without paying them.


tbh I would take the headline and first hundred words in a news aggregator. that seems fine?


Yes, that is fine! That's how RSS feeds usually work when you follow more "mainstream" news sources. At the very least, you see the name of the author and you actually make a connection to their server that can be measured in the analytics.

But instead, Kagi "helpfully" regurgitates the whole story, visits the article once, delivers it to presumably thousands, and it can't even be bothered to display all of the sources it regurgitates unless you click to expand the dropdown. And even then the headline itself is one additional click away, and they straight up don't even display the name of the journalist in the pop-up, just the headline.

Incredibly shitty behaviour from them. And then they have the balls to start their about page with this:

> Why Kagi News? Because news is broken.


And yet, after trying it, I have to admit it's more informative and less provocative than any other news source I've seen since at least 2005.

I don't know how they do it, and I'm not sure I care, the result is they've eliminated both clickbait and ragebait, and the news are indeed better off for it!


Soulless, uncreative, not fact-checked (or read by anyone before clicking publish), not contributing anything back to the original journalists, all of the editorial decisions are done by an undeterministic AI filter.

Not gonna call it the worst insult to journalism I've ever seen because I've seen factually(.)so which does essentially the same thing but calls it an "AI fact check", but it's not much better.

It's like instead of borrowing a book from the library, there's like a spokesperson at the entrance who you ask a question and then blindly believe whatever they say.


>soulless,uncreative

This is exactly how I want my news to be. Nothing worse than a headline about a new vaccine breakthrough, followed by a first paragraph that starts with "it was a cold November morning as I arrived in..."

I guess it's a matter of taste, but I prefer it short and to the point


Yes, they are not the only player here. Quite a few companies are doing this, if you use Perplexity, they also have a news tab with the exact feature set.


> if you use Perplexity, they also have a news tab with the exact feature set

"Exact" is far from accurate. I just did a side-by-side comparison. To name only two obvious differences:

A. At the top level, Perplexity has a "Discover" tab [1] -- not titled "News". That leads to a AAF page with the endless-scroll anti-pattern (see [2] [3] for other examples). Kagi News [4] presents a short list of ~7ish items without images.

B. At the detail-page level, Kagi organizes their content differently (with more detail, including "sources", "highlights", "perspectives", "historical background", and "quick questions"). Perplexity only has content with sources and "discover more". You can verify for yourself.

[1]: https://www.perplexity.ai/discover

[2]: https://www.reddit.com/r/rant/comments/e0a99k/cnn_app_is_ann...

[3]: https://www.tumblr.com/make-me-imagine/614701109842444288/an...

[4]: https://kite.kagi.com


After using Perplexity, its news tab is US-centric, without much options to get regional content from what i can see.

Kagi seems to offer regional news and the sources appear to be from the respective area also. do appreciate public access (for now?) with RSS feeds (ironic but handy).


ML did not figure out every solution on planet earth. It figured out the LLM, and that is the giant's shoulder most apps will stand on.


Maybe you should read the article before you assume how it works. It’s pretty clear and AI is specifically mentioned.


You’re being presumptuous. I read the article yesterday and there was no mention of AI or LLMs, they have changed it, which is good.

https://web.archive.org/web/20250930154005/https://blog.kagi...


Thanks for pointing out that this is yet more AI slop. Very disappointing for Kagi to do this. I get my money's worth from searches, but if I was looking for more features I would want them to be not AI-based.


I guess they embed the news of the day and let it summarize it. You can add metadata to the training set, which you should technically query reliably. You don't have to let the model do the summarization of the source, which can be erroneous.

Far more interesting is how they aggregate the data. I thought many sources moved behind paywalls already.


Kagi is probably the only pro-LLM company praised on HN. Perhaps people's hatred towards Google outweighs that of LLM.

Imagine if Google news use LLM to show summaries to the users without explicitly saying it's AI on the UI.

Ironically, one of the first LLM-induced mistakes experienced by average people was a news summary: https://www.bbc.com/news/articles/cge93de21n0o.amp


> Kagi is probably the only pro-LLM company praised on HN.

Kagi made search useful again, and their genAI stuff can be easily ignored. Best of both worlds -- it remains useful for people like myself who don't want genAI involved, but there's genAI stuff for people who like that sort of thing.

That said, if their genAI stuff gets to be too hard to ignore, then I'd stop using or praising Kagi.

That this is about news also makes it less problematic for me. I just won't see it at all, since I don't go to Kagi for news in the first place.


I'm not against AI summaries if they are marked as so. Sneakily sliding LLM under the table is a dark pattern no matter how I interpret their intentions.

Even Google calls the overview box AI Overview (not saying it doesn't hurt content hosting sites.)


Disappointing. Non-LLM NLP summarization is actually rather good these days. It works by finding the key sentences in the text and extracting the relevant sections, no possibility for hallucination. No need to go full AI for this feature.


That’s interesting. Could you share an example or a resource about this?


i believe an llm output is fine for giving an overview if provided the articles, if you want a detailed overview you should be reading the articles anyways.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: