Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
History of Hacker News Search from 2007 to 2024 (trieve.ai)
120 points by skeptrune 11 months ago | hide | past | favorite | 72 comments



On the topic of Hacker News search… a useful trick that not enough people know is that you can take the ID of a story and use search to return all comments ordered by most recent first - great for keeping up with what’s new in a specific conversation.

Eg for this thread the most recent comments can be found here: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

I built an Observable notebook to save me from having to manually construct those searches here: https://observablehq.com/@simonw/hacker-news-homepage


Since it's mention-undocumented-endpoints day, https://news.ycombinator.com/latest?id=$ID does that, where $ID can either be a story (in which case you get the entire thread sorted reverse-chronologically) or a comment (in which case you get the comments in that subthread).

So, for the current thread: https://news.ycombinator.com/latest?id=41228935

or for a subthread: https://news.ycombinator.com/latest?id=41229914.


Wow, I had no idea! Thanks for that.


HN could really use some client software. If nothing else for choosing how to read thread replies (ie: most recent, most upvoted/popular, most downvoted, by replies from the OP, etc) + a more advanced built in search.


> more advanced built-in search

Do you have any specific feature requests? I would love more suggestions and ideas!


Please add "OR" operator, so we could search for e.g. "webkit OR gecko OR blink" in a single query.


hn.algolia.com can't do "or" so just the normal stuff for search would be a good start. https://hn.algolia.com/help

The date chooser on hn.algolia.com is also broken but a very handy setting.

And I just saw why search always brings up wrong stuff "typo-tolerance" is default on - https://hn.algolia.com/settings

I have been to the settings, but it's a good example of why defaults matter and on-screen really helps people.

And extra crazy with "typo-tolerance" ON, "optimized" and "optimised" are very different results. Which is technically true, that is not a typo, that's grammatical.

Filter by author is important as people use that to search their own stuff.


i want to publicly thank Algolia for providing an excellent HN search for so long. i was on a call with Linus Lee and we both were referencing something on HN and i started pulling it up and he said "i know exactly what website you're on" without seeing my screen and it was of course the Algolia HN search. unbelievable mind sync.

Idk if it can be replaced (i guess i could do with semantic search + content crawling to start?), but even if it is replaced, Algolia will always have a special place in my heart for doing such a great job for free. thank you whoever worked on it (Algolians - is there a behind the scenes writeup somewhere?)


It is free, but Algolia is a YCombinator backed company (YC W14) so for them it's probably very useful as a sort of low-stakes phase-1-prod environment. Basically a win-win-win.


Referencing something on HN and then being amazed youre both thinking about the HN search engine... Unbelievable mind sync.


Now, if only it could be used without necessitating JavaScript...


Yeah, I always have to turn it on to search for something. It's a pain. It's friction.

Javascript is not required for search. It has never been required for search. Google. Yahoo. Alta Vista. They worked fine without it back in the day.


Google still works without javascript.


There are a few capabiliti es lacking from Algolia which I'd really like to see in a replacement:

- Negative search / exclusion: the ability to exclude terms from a search, as in "procfs -linux", which would look for any references to "procfs" which did not also reference "linux".

Edit: This exists, see dang's reply below.

- Replies to a specific user, e.g., "by:dredmorbius inreplyto:skeptrune <search terms>". I'm often looking for a specific context of my own previous comments.

- An improved date-bounding interface. If there's one thing that frustrates me about Algolia's interface, it's the GUI (and syntax) for defining dates. It's cumbersome, and at least on my browser, the dates are generally hard to read or invisible. Going back years is especially cumbersome.

I'll add: Algolia has been massively useful, and the fact that I can search HN, especially for my own content, has been a huge part of the value of the site, and is worlds ahead of other online platforms. (Mastodon / the Fediverse is catching up here, Diaspora*'s lack of search was among my main frustrations with the site and explains my absence there after more than a decade of participation.)


Algolia does do search term exclusion. Compare https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... and https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu....

On the second point - HN has an undocumented endpoint https://news.ycombinator.com/replies?id=skeptrune&by=dredmor... but that doesn't give you search of course.


Search term exclusion: TIL! I swear I've tried that w/o luck before. Possibly was confounding that with OR pairing, usually given as "(termA|termB)". I'm pretty sure that doesn't work as expected (and just tested it). It's also annoyingly absent from DDG search, which is out of scope for here.

I'd been made aware of the undocumented endpoint but didn't want to spill the beans ;-) It's apparently expensive. OK to run manually, but don't script it.


Oh yes, OR clauses would be great.

Maybe Trieve can give us propositional logic over search terms!


Can do!


Is it possible to add the alternative ("OR") operator? https://github.com/algolia/hn-search/issues/169


whoah


Just guessed another two undocumented endpoints "deadcomments" and "deadstories", these take you to a special admin login page if you're logged out, or say "Unknown." if logged in:

https://news.ycombinator.com/deadcomments

https://news.ycombinator.com/deadstories

And the endpoint "flagged", which is empty for me:

https://news.ycombinator.com/flagged

But if you add a username, e.g. "?id=dang", it says "Can't display that." instead:

https://news.ycombinator.com/flagged?id=dang

Interesting to stumble upon these even if they do nothing for a non-admin user!


`/flagged` is the list of stories you've flagged.


That makes sense, thank you.


/flagged, /upvoted, and /favorites, with an "?id=<your_userID>" are how your own flagged, upvoted, and favourited posts/comments are displayed. "&kind=comments" specifically returns flagged comments, otherwise posts. For upvotes and faves, the argument is "&comments=t", for consistency one presumes...

Favourites can be displayed for other users, upvoted and flagged cannot (for mere mortals).

I suspect downvotes can also be displayed, though I don't know that syntax.


I recently found out that "/vouched?id=<your_userID>" also works.


The replies/by is particularly handy in sorting out why a dead account has been banned, or if there've been previous moderator interactions. That's my own principle use.


>Replies to a specific user

Will definitely make sure to add that to our search before launch! That is a really good idea.

>Improved date-bounding interface

Note taken :)

We also have negated terms which will work the same way.


I would like to sort comments by the level of the author’s expertise in whatever they are discussing. HN is a goldmine, but finding valuable knowledge within heated or elaborate discussions requires too much commitment to read through everything.

A weighted number of a comment’s upvotes is one signal. However, I can often tell when an author has deep knowledge or comprehensive experience with a subject just by reading their comment.

Do you think it might be possible to automate that kind of judgment?


I would really love that to be possible. It is ultimately, I suspect, one of the Hard Problems of epistemology / epistemic systems.

Diverging slightly: truth is not a popularity contest. The "wisdom of crowds" concept argues that crowds are, on average, more intelligent than individuals, even expert individuals. In practice ... crowds are subject to their own biases and failures. While uninformed (or lightly-informed) opinion may be better than no opinion, expert opinion tends to be superior to both ... though of course it is also subject to biases (co-option of motives, ideological and academic conservativism, etc.). Still, there are times when the popular winner is quite evidently not the most informative or relevant winner. Reddit is especially subject to this (and more so in the past couple of years than previously based on my very rare sojourns there).

Ultimately the question of a rating / moderation / ranking system is what do you want to optimise for? I'd written on this about a decade back now:

<https://web.archive.org/web/20200629055317/https://www.reddi...>

LLM AI seems like it might offer either a way of weighting individual votes in their appropriate areas of expertise, or offering its own assessment of relevance based on specific criteria (say: truth valance, significance, novelty). I still suspect it's not the sort of thing that's easily obtained. And is probably beyond the scope of an HN search tool.

But I love the suggestion.


And so long as we're all divulging secrets here ...

I've hacked the HN CSS to my own liking, links in my profile. Most of that's styling and such.

What's not included there is something I find useful: some visual tweaks to not specific contexts (users/sites) of interest.

As examples, it might be handy to recognise admin comments and posts immediately. Or YC hiring notices. Or people or sites you find particularly clueful. Or perhaps not.

I've found it useful, and a little classification goes a long way (long tails, Zipf functions, etc., etc.).


As long as people are throwing suggestions your way, it might cool if we could sort by either Votes or Comments rather than just "popularity" which is seemingly just sorting by votes. A lot of posts will only have 30-50 votes but over a hundred comments which makes them hard to find even by searching.


While you're taking notes ...

Comments to a posts from a specific domain might also be useful. HN exposes that in its story syntax and I'm pretty sure that's in the API as well.


Yep, already did that one! Making it performant was a pain, but it should be useful.

Other feature requests/ideas would definitely be appreciated.


The one other: a URL-based selector for comments vs. posts. That's one thing Algolia seems to be missing. It defaults to post search, I'm usually looking for comments. That's one additional step, which isn't fatal but is vaguely annoying.

I think that's my main gripes / wants.

And I'm a heavy user of Algolia and other search tools, so I think that's a reasonably complete set.


Updated the blog at the link to include PG's HN post and the archive-available ycombinator.com post documenting the Octopart/ThriftDB search launch.

Commit here - https://github.com/devflowinc/trieve-website/commit/ab563475...

Links: - https://news.ycombinator.com/item?id=2619736

- https://web.archive.org/web/20110618105517/http://ycombinato...


I'm a little confused by the context and comments here. Is Trieve associated with HN at all or is this an independent/third-party offering? Is the Algolia search going anywhere? Will the search field on HN still take me to Algolia search or is that changing?


It's independent/thirdparty, not officially associated with HN, though Trieve is YC-funded (https://www.ycombinator.com/companies/trieve) and in that sense there's an affiliation.

There's no current plan to change HN Search though I wouldn't rule it out. PG often used to integrate recent YC startups into HN in various ways (search as detailed by the OP, but also an SSO startup at one point, a carbon-reduction startup, I forget what else) as a way of giving them a boost and I could imagine us doing that again. (side note: I guess the mental model in my head is that earlier-stage startups are more closely bonded to YC and that as they succeed and expand, there's certainly still a friendly connection but the attachment becomes a bit weaker. For startups that have been around 10+ years, for example, I'm not sure it still makes sense to have frontpage job ads on HN.)

The change I'd really like to make to HN Search is to bring the front-end part of it into the HN codebase so the search results can be 'real' HN pages*. It's never been a priority to implement that though, and the Algolia system has been fabulous for a long time.

* Funnily enough, I made almost the same point using my pre-dang account back when Algolia was just getting started: https://news.ycombinator.com/item?id=7126635. Credit to pvg, who misses nothing, for spotting this. The reply was from Algolia cofounder ndessaigne who is now a group partner at YC! He was good on his word, btw.


It's YC-aside, anyway. Just from a general founder perspective: there comes a point where repeatability far outpaces any growth hacks you put in place perviously. In that vain, I think it shows exceedingly good sportsmanship towards founders to switch someone who has long established repeatable business to someone who has just found it. If you give someone a chance right as they hit escape velocity PMF, I suspect you provide a considerable tailwind, certainly a very kind thing to do for any founder if you're able.


Putting this here with zero expectations but kinda hopeful there’s a workaround.

I like to keep my hands on my keyboard and I can `command-L` `hn` `return` my way to Algolia quickly from an open browser.

But why oh why doesn’t the search input have focus by default. And since it doesn’t why can’t I type `/` to get focus on the search input. I guess by now the three tab presses should be muscle memory for that but I’m so annoyed by that fact I refuse to internalize it.

Apologies for the random rant


In Firefox, just add a bookmark for https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so..., edit it and add a keyword like "hn". Then type in the address bar: "hn whatever" and it will show you SERP for whatever.

Another way is to add the OpenSearch engine (https://mycroftproject.com/install.html?id=84751&basename=ha...) to Firefox and assign a shortcut key to it: https://support.mozilla.org/en-US/kb/assign-shortcuts-search...


Create a greasemonkey/userscript for existing browser extension?

Possibly useful example: https://github.com/imdj/HNRelevant


I wonder if they'll let us search flagged and dead posts and comments.

I remember reading some insightful exchanges back in the day that got flagged because of being a controversial topic that other users didn't like.

No way to find them now, even knowing some keywords and approximate month and year.


We had to exclude [dead] and eventually even just [flagged] posts from the public API because many third-party clients and sites were displaying them as if they were regular posts. For the ever-fragile HN ecosystem, that is catastrophic. We would get angry emails saying "how can your expletive site possibly condone such expletive expletive comments as <link>" ... and then it would turn out that <link> was a post by some account that had been banned for years.

It's fine if users turn 'showdead' on in their profile to read everything—just please remember that by doing that, you're subscribing to various bottoms of various barrels. But it's definitely not ok when people browse HN with some app we have nothing to do with, run into horrible things, understandably are outraged and then forever have their view of HN imprinted.

IMO this issue is existential for HN. We've spent years and so much energy trying to find a balance between internet openness and human decency, a task which oscillates between barely-possible and simply-doomed, so the idea that anybody anywhere sees anything labeled "Hacker News" that pours all the toxic waste back into the commons is physically painful to me. Much as I dislike the idea of restricting anyone's curiosity about the entire corpus of what gets posted, I don't see what choice we have.


I figured as such. Can I ask that you change the website itself to make them visible to logged out accounts? I understand exactly why you did this but I feel like if you showed them collapsed by default on the single-comment page and you have to actively click on a "banned" to expand them you really are out of line when you complain about how Hacker News hosts horrible content or whatever.


You mean for [dead] comments? Sorry, but strong no. Logged-in-with-showdead-turned-on has proven to be the correct height for that gate. Anyone who wants to can easily clear it, but the small amount of effort and information required means that most people become core community members before turning it on. If we lowered it, naive-casual readers would (through no fault of their own) misunderstand what they were looking at and the dynamic I just described would kick in.

The longer I've worked on HN the more I've come to appreciate PG's design of this critical aspect of the site. No content is hidden from users who want to see it*, but the worst is (mostly) cordoned off so it doesn't destroy the community. Banned users can continue to post, but their comments are autokilled, so they're cordoned off by default.

We're often asked: why allow banned users to continue to post? The answer is that if we didn't, they'd just create new accounts, and then they'd be posting with unbanned accounts until we caught them and banned them again: a strictly worse situation. This is one aspect of PG's design that took me years to appreciate and got me thinking it might even be optimal.

The one major change we made to the original design was adding 'vouching' (https://news.ycombinator.com/item?id=10298512), which lets the community transfer cordoned-off posts back to the commons if there's nothing wrong with them. That bit has worked out really well.

* (Except for [deleted] posts. If you see [deleted] it always means that either the author deleted it or asked us to do that for them.)


Thank you dang, I very much appreciate you taking the time to explain.

I was going to suggest maybe the public API could have a "showdead" flag too but I guess that too easily enables the problem you're trying to prevent? As in an enterprising app developer could turn the "showdead" tap to "yes" with every request and then the waste gushes out once more.


I can appreciate that concern and see it even with flagging / dead / killed posts and submissions.

I've had my own concerns about HN's moderation, both excesses and insufficiencies. When I've done occasional polls about what people's issues are about HN I'm very often pointed to comments which now show as flagged. I'd found a few which hadn't been flagged and forwarded those to dang, who (admittedly long after the fact) flagged them. As dang's said many time, moderators don't see everything, most moderation is by members, and mods step in relatively rarely.

Based on Whaly's 2021 analysis and looking at dang's own comment post history (via Algolia), HN nets roughly 4 million comments/year and 400k submissions, with about 150k active members. Over his ten years as moderator dang's averaged about 20 comments per day, though there's a great deal more moderation occurring (some automated, some member-based, some manual but not noted with comments, which tend to be reserved for established accounts).

My read is that HN mostly tends toward its stated goals and, frankly, good-netizen behaviour. It does have a pronounced status-quo bias, though it seems to be self-aware on this point. I've a few further concerns I'm still thinking through.

The problem with an overly-open archive is that this makes possible misconstrued assertions about what HN does or doesn't tolerate. An open-access archive and third-party apps which don't reflect moderation actions, say, a third-party app which explicitly only showed flagged, killed, and/or dead posts, comments, and users, would paint a distinctly different picture of HN, and one which would greatly harm the reputation of the site.

There are some ... possible ways around this. HN uses sequentially-numbered IDs for posts and comments (both are treated the same so far as I can tell). UserIDs seem to have an internal representation which is similar (I've seen, for example, names which change over time), but the internal representation doesn't seem to be publicly exposed. If you want to find my own content you'd do it with "UserID=dredmorbius" and not by some numeric identifier.

But the numeric content ID means that a determined scraper could walk (sequentially or randomly) through the entire database, pull out every post and comment, and then glue those back together. That's somewhat north of 40 million items presently.... (There are benefits to using sparse, random / arbitrary UUIDs for systems.)



Here is the list of recent flagged/dead stories, ordered by rating, that they managed to accumulate: https://play.clickhouse.com/play?user=play#U0VMRUNUIGlkLCB0a...

Here is the list of recent flagged/dead stories: https://play.clickhouse.com/play?user=play#U0VMRUNUIGlkLCB0a...


Is that playground open source?


Yes, the backend is https://github.com/ClickHouse/ClickHouse/ It has no front-end (everything is served by the database itself).

The HTML page: https://github.com/ClickHouse/ClickHouse/blob/master/program... (or press "view source" in the browser).

The dataset: https://github.com/ClickHouse/ClickHouse/issues/29693


It's possible to favourite such items. They'll not be visible unless you're logged in, but at least you can go through your fave list and find them.

Or, of course, bookmark them yourself for later reference.

I've run into this issue myself, though dang's reply (still being edited as I write this) does hit on some valid points.


They're using the firebase API, https://github.com/devflowinc/trieve-hn-discovery/tree/main/... so no showdead, I think.


contributes to the echo chamber


Related:

Vote on Algolia vs. Trieve HN Dataset Blind Search Relevance Poll?

https://news.ycombinator.com/item?id=41172033


What I am missing from Algolia's Hacker News Search is the "OR" (alternative) operator, so that I can search for e.g. "WebKit OR Gecko OR Blink" in a single query. I hope Trieve will have such functionality!


Can we get an option to search for users/usernames? Or even better, searching for users based on their karma? ;-)


There's also the Leaders list which shows the 100 highest-karma members:

<https://news.ycombinator.com/leaders>

That's ... pretty constant over time, more so amongst the highest-ranked members than, say, positions 75--100.

Whaly did an analysis of the most-active HN participants in 2022, looking at 2021 data. That's also fairly constant.

<https://whaly.io/posts/top-10k-commenters-of-hacker-news-in-...>

Discussion: <https://news.ycombinator.com/item?id=29778994>


I know, I’ve seen them both :-) As you correctly mentioned, the top100 rarely changes and thus has limited practical usage. I’d like to see something more dynamic and relevant for us plebeians, i.e. being able to sort all users by karma, comments, submissions, creation date.


Among other information from Whaly: there are roughly 150k active HN members per year. The 10k list gets you by far the overwhelming lion's share of activity (it's already highly concentrated in the top-100 list).

I don't know that there's a comprehensive member list available overall, and the lack of some sequential userIDNumber identifier means that the space isn't readily searchable. I suspect Whaly's approach of snarfing all activity via the API is probably the most comprehensive. A new search tool might be able to do that (and tie in other metrics such as posts and karma) as well.

There's the additional issue that overall karma is not a particularly useful measure. It's possible to achieve a high karma simply through excessive activity (my own account is a case in point). There are domain experts such as Alan Kay (@alankay) who have decidedly pedestrian karma (4,566), but that over 1 story (1,400 votes) and 110 additional comments, or about 29 votes/comment. That's well above my own average, which is about 1-2 votes per comment (2,087 stories, 28,107 comments).

Karma/posts / karma/comments might be slightly more informative, but can also be skewed by submitting a highly-popular story. The larger problem is that truth, expertise, and credibility aren't popularity contests.

There's the author's expertise suggestion by tillulen which I really like, though suspect is still not especially viable: <https://news.ycombinator.com/item?id=41231865>

A further problem with expertise is that it is highly domain-specific. The fallacy / error of ascribing general expertise to one who's shown mastery in one specific area is a grave one. Particularly when that one specific area is earning money / winning a lottery.


Oh: and based on my front-page scraping, I coded up some command-line tools to return activity / results for domains and members so I could do quick checks on both.

Note that what makes the front-page is an interesting function of both what the member posts and what the membership as a whole responds to. Someone might submit a large number of unappealing topics, but get traction on a much smaller subset. FWIW that seems to be my own experience. My tools will show the posts that got traction, but not the much larger low-vote long tail. Or flagged/killed posts, FWIW.


User-specific search is possible in Algolia using "by:<username> <query>". So "by:dredmorbius privacy" will find my posts or comments on privacy.


One thing I'd like is ability to search your (or others') favorites.


That's a really cool idea and also likely doable. Probably won't ship it before we release, but will add it to our backlog.


The more features, the lamer the product. Keep HN clean!


I disagree. The more features the more useful the search engine. It doesn't make HN dirty, search engine is a separate project.


I was looking at Algolia's website recently and it seems they really went all in with the "AI" marketing/SEO.


"we put the AI in AIgolia"


There's a search feature?


When is the launch?


Not sure yet! Fingers crossed it'll be this week or next.


Cool!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: