On the topic of Hacker News search… a useful trick that not enough people know is that you can take the ID of a story and use search to return all comments ordered by most recent first - great for keeping up with what’s new in a specific conversation.
Since it's mention-undocumented-endpoints day, https://news.ycombinator.com/latest?id=$ID does that, where $ID can either be a story (in which case you get the entire thread sorted reverse-chronologically) or a comment (in which case you get the comments in that subthread).
HN could really use some client software. If nothing else for choosing how to read thread replies (ie: most recent, most upvoted/popular, most downvoted, by replies from the OP, etc) + a more advanced built in search.
I have been to the settings, but it's a good example of why defaults matter and on-screen really helps people.
And extra crazy with "typo-tolerance" ON, "optimized" and "optimised" are very different results. Which is technically true, that is not a typo, that's grammatical.
Filter by author is important as people use that to search their own stuff.
i want to publicly thank Algolia for providing an excellent HN search for so long. i was on a call with Linus Lee and we both were referencing something on HN and i started pulling it up and he said "i know exactly what website you're on" without seeing my screen and it was of course the Algolia HN search. unbelievable mind sync.
Idk if it can be replaced (i guess i could do with semantic search + content crawling to start?), but even if it is replaced, Algolia will always have a special place in my heart for doing such a great job for free. thank you whoever worked on it (Algolians - is there a behind the scenes writeup somewhere?)
It is free, but Algolia is a YCombinator backed company (YC W14) so for them it's probably very useful as a sort of low-stakes phase-1-prod environment. Basically a win-win-win.
There are a few capabiliti es lacking from Algolia which I'd really like to see in a replacement:
- Negative search / exclusion: the ability to exclude terms from a search, as in "procfs -linux", which would look for any references to "procfs" which did not also reference "linux".
Edit: This exists, see dang's reply below.
- Replies to a specific user, e.g., "by:dredmorbius inreplyto:skeptrune <search terms>". I'm often looking for a specific context of my own previous comments.
- An improved date-bounding interface. If there's one thing that frustrates me about Algolia's interface, it's the GUI (and syntax) for defining dates. It's cumbersome, and at least on my browser, the dates are generally hard to read or invisible. Going back years is especially cumbersome.
I'll add: Algolia has been massively useful, and the fact that I can search HN, especially for my own content, has been a huge part of the value of the site, and is worlds ahead of other online platforms. (Mastodon / the Fediverse is catching up here, Diaspora*'s lack of search was among my main frustrations with the site and explains my absence there after more than a decade of participation.)
Search term exclusion: TIL! I swear I've tried that w/o luck before. Possibly was confounding that with OR pairing, usually given as "(termA|termB)". I'm pretty sure that doesn't work as expected (and just tested it). It's also annoyingly absent from DDG search, which is out of scope for here.
I'd been made aware of the undocumented endpoint but didn't want to spill the beans ;-) It's apparently expensive. OK to run manually, but don't script it.
Just guessed another two undocumented endpoints "deadcomments" and "deadstories", these take you to a special admin login page if you're logged out, or say "Unknown." if logged in:
/flagged, /upvoted, and /favorites, with an "?id=<your_userID>" are how your own flagged, upvoted, and favourited posts/comments are displayed. "&kind=comments" specifically returns flagged comments, otherwise posts. For upvotes and faves, the argument is "&comments=t", for consistency one presumes...
Favourites can be displayed for other users, upvoted and flagged cannot (for mere mortals).
I suspect downvotes can also be displayed, though I don't know that syntax.
The replies/by is particularly handy in sorting out why a dead account has been banned, or if there've been previous moderator interactions. That's my own principle use.
I would like to sort comments by the level of the author’s expertise in whatever they are discussing. HN is a goldmine, but finding valuable knowledge within heated or elaborate discussions requires too much commitment to read through everything.
A weighted number of a comment’s upvotes is one signal. However, I can often tell when an author has deep knowledge or comprehensive experience with a subject just by reading their comment.
Do you think it might be possible to automate that kind of judgment?
I would really love that to be possible. It is ultimately, I suspect, one of the Hard Problems of epistemology / epistemic systems.
Diverging slightly: truth is not a popularity contest. The "wisdom of crowds" concept argues that crowds are, on average, more intelligent than individuals, even expert individuals. In practice ... crowds are subject to their own biases and failures. While uninformed (or lightly-informed) opinion may be better than no opinion, expert opinion tends to be superior to both ... though of course it is also subject to biases (co-option of motives, ideological and academic conservativism, etc.). Still, there are times when the popular winner is quite evidently not the most informative or relevant winner. Reddit is especially subject to this (and more so in the past couple of years than previously based on my very rare sojourns there).
Ultimately the question of a rating / moderation / ranking system is what do you want to optimise for? I'd written on this about a decade back now:
LLM AI seems like it might offer either a way of weighting individual votes in their appropriate areas of expertise, or offering its own assessment of relevance based on specific criteria (say: truth valance, significance, novelty). I still suspect it's not the sort of thing that's easily obtained. And is probably beyond the scope of an HN search tool.
And so long as we're all divulging secrets here ...
I've hacked the HN CSS to my own liking, links in my profile. Most of that's styling and such.
What's not included there is something I find useful: some visual tweaks to not specific contexts (users/sites) of interest.
As examples, it might be handy to recognise admin comments and posts immediately. Or YC hiring notices. Or people or sites you find particularly clueful. Or perhaps not.
I've found it useful, and a little classification goes a long way (long tails, Zipf functions, etc., etc.).
As long as people are throwing suggestions your way, it might cool if we could sort by either Votes or Comments rather than just "popularity" which is seemingly just sorting by votes. A lot of posts will only have 30-50 votes but over a hundred comments which makes them hard to find even by searching.
The one other: a URL-based selector for comments vs. posts. That's one thing Algolia seems to be missing. It defaults to post search, I'm usually looking for comments. That's one additional step, which isn't fatal but is vaguely annoying.
I think that's my main gripes / wants.
And I'm a heavy user of Algolia and other search tools, so I think that's a reasonably complete set.
I'm a little confused by the context and comments here. Is Trieve associated with HN at all or is this an independent/third-party offering? Is the Algolia search going anywhere? Will the search field on HN still take me to Algolia search or is that changing?
It's independent/thirdparty, not officially associated with HN, though Trieve is YC-funded (https://www.ycombinator.com/companies/trieve) and in that sense there's an affiliation.
There's no current plan to change HN Search though I wouldn't rule it out. PG often used to integrate recent YC startups into HN in various ways (search as detailed by the OP, but also an SSO startup at one point, a carbon-reduction startup, I forget what else) as a way of giving them a boost and I could imagine us doing that again. (side note: I guess the mental model in my head is that earlier-stage startups are more closely bonded to YC and that as they succeed and expand, there's certainly still a friendly connection but the attachment becomes a bit weaker. For startups that have been around 10+ years, for example, I'm not sure it still makes sense to have frontpage job ads on HN.)
The change I'd really like to make to HN Search is to bring the front-end part of it into the HN codebase so the search results can be 'real' HN pages*. It's never been a priority to implement that though, and the Algolia system has been fabulous for a long time.
* Funnily enough, I made almost the same point using my pre-dang account back when Algolia was just getting started: https://news.ycombinator.com/item?id=7126635. Credit to pvg, who misses nothing, for spotting this. The reply was from Algolia cofounder ndessaigne who is now a group partner at YC! He was good on his word, btw.
It's YC-aside, anyway. Just from a general founder perspective: there comes a point where repeatability far outpaces any growth hacks you put in place perviously. In that vain, I think it shows exceedingly good sportsmanship towards founders to switch someone who has long established repeatable business to someone who has just found it. If you give someone a chance right as they hit escape velocity PMF, I suspect you provide a considerable tailwind, certainly a very kind thing to do for any founder if you're able.
Putting this here with zero expectations but kinda hopeful there’s a workaround.
I like to keep my hands on my keyboard and I can `command-L` `hn` `return` my way to Algolia quickly from an open browser.
But why oh why doesn’t the search input have focus by default. And since it doesn’t why can’t I type `/` to get focus on the search input. I guess by now the three tab presses should be muscle memory for that but I’m so annoyed by that fact I refuse to internalize it.
We had to exclude [dead] and eventually even just [flagged] posts from the public API because many third-party clients and sites were displaying them as if they were regular posts. For the ever-fragile HN ecosystem, that is catastrophic. We would get angry emails saying "how can your expletive site possibly condone such expletiveexpletive comments as <link>" ... and then it would turn out that <link> was a post by some account that had been banned for years.
It's fine if users turn 'showdead' on in their profile to read everything—just please remember that by doing that, you're subscribing to various bottoms of various barrels. But it's definitely not ok when people browse HN with some app we have nothing to do with, run into horrible things, understandably are outraged and then forever have their view of HN imprinted.
IMO this issue is existential for HN. We've spent years and so much energy trying to find a balance between internet openness and human decency, a task which oscillates between barely-possible and simply-doomed, so the idea that anybody anywhere sees anything labeled "Hacker News" that pours all the toxic waste back into the commons is physically painful to me. Much as I dislike the idea of restricting anyone's curiosity about the entire corpus of what gets posted, I don't see what choice we have.
I figured as such. Can I ask that you change the website itself to make them visible to logged out accounts? I understand exactly why you did this but I feel like if you showed them collapsed by default on the single-comment page and you have to actively click on a "banned" to expand them you really are out of line when you complain about how Hacker News hosts horrible content or whatever.
You mean for [dead] comments? Sorry, but strong no. Logged-in-with-showdead-turned-on has proven to be the correct height for that gate. Anyone who wants to can easily clear it, but the small amount of effort and information required means that most people become core community members before turning it on. If we lowered it, naive-casual readers would (through no fault of their own) misunderstand what they were looking at and the dynamic I just described would kick in.
The longer I've worked on HN the more I've come to appreciate PG's design of this critical aspect of the site. No content is hidden from users who want to see it*, but the worst is (mostly) cordoned off so it doesn't destroy the community. Banned users can continue to post, but their comments are autokilled, so they're cordoned off by default.
We're often asked: why allow banned users to continue to post? The answer is that if we didn't, they'd just create new accounts, and then they'd be posting with unbanned accounts until we caught them and banned them again: a strictly worse situation. This is one aspect of PG's design that took me years to appreciate and got me thinking it might even be optimal.
The one major change we made to the original design was adding 'vouching' (https://news.ycombinator.com/item?id=10298512), which lets the community transfer cordoned-off posts back to the commons if there's nothing wrong with them. That bit has worked out really well.
* (Except for [deleted] posts. If you see [deleted] it always means that either the author deleted it or asked us to do that for them.)
Thank you dang, I very much appreciate you taking the time to explain.
I was going to suggest maybe the public API could have a "showdead" flag too but I guess that too easily enables the problem you're trying to prevent? As in an enterprising app developer could turn the "showdead" tap to "yes" with every request and then the waste gushes out once more.
I can appreciate that concern and see it even with flagging / dead / killed posts and submissions.
I've had my own concerns about HN's moderation, both excesses and insufficiencies. When I've done occasional polls about what people's issues are about HN I'm very often pointed to comments which now show as flagged. I'd found a few which hadn't been flagged and forwarded those to dang, who (admittedly long after the fact) flagged them. As dang's said many time, moderators don't see everything, most moderation is by members, and mods step in relatively rarely.
Based on Whaly's 2021 analysis and looking at dang's own comment post history (via Algolia), HN nets roughly 4 million comments/year and 400k submissions, with about 150k active members. Over his ten years as moderator dang's averaged about 20 comments per day, though there's a great deal more moderation occurring (some automated, some member-based, some manual but not noted with comments, which tend to be reserved for established accounts).
My read is that HN mostly tends toward its stated goals and, frankly, good-netizen behaviour. It does have a pronounced status-quo bias, though it seems to be self-aware on this point. I've a few further concerns I'm still thinking through.
The problem with an overly-open archive is that this makes possible misconstrued assertions about what HN does or doesn't tolerate. An open-access archive and third-party apps which don't reflect moderation actions, say, a third-party app which explicitly only showed flagged, killed, and/or dead posts, comments, and users, would paint a distinctly different picture of HN, and one which would greatly harm the reputation of the site.
There are some ... possible ways around this. HN uses sequentially-numbered IDs for posts and comments (both are treated the same so far as I can tell). UserIDs seem to have an internal representation which is similar (I've seen, for example, names which change over time), but the internal representation doesn't seem to be publicly exposed. If you want to find my own content you'd do it with "UserID=dredmorbius" and not by some numeric identifier.
But the numeric content ID means that a determined scraper could walk (sequentially or randomly) through the entire database, pull out every post and comment, and then glue those back together. That's somewhat north of 40 million items presently.... (There are benefits to using sparse, random / arbitrary UUIDs for systems.)
What I am missing from Algolia's Hacker News Search is the "OR" (alternative) operator, so that I can search for e.g. "WebKit OR Gecko OR Blink" in a single query. I hope Trieve will have such functionality!
I know, I’ve seen them both :-)
As you correctly mentioned, the top100 rarely changes and thus has limited practical usage. I’d like to see something more dynamic and relevant for us plebeians, i.e. being able to sort all users by karma, comments, submissions, creation date.
Among other information from Whaly: there are roughly 150k active HN members per year. The 10k list gets you by far the overwhelming lion's share of activity (it's already highly concentrated in the top-100 list).
I don't know that there's a comprehensive member list available overall, and the lack of some sequential userIDNumber identifier means that the space isn't readily searchable. I suspect Whaly's approach of snarfing all activity via the API is probably the most comprehensive. A new search tool might be able to do that (and tie in other metrics such as posts and karma) as well.
There's the additional issue that overall karma is not a particularly useful measure. It's possible to achieve a high karma simply through excessive activity (my own account is a case in point). There are domain experts such as Alan Kay (@alankay) who have decidedly pedestrian karma (4,566), but that over 1 story (1,400 votes) and 110 additional comments, or about 29 votes/comment. That's well above my own average, which is about 1-2 votes per comment (2,087 stories, 28,107 comments).
Karma/posts / karma/comments might be slightly more informative, but can also be skewed by submitting a highly-popular story. The larger problem is that truth, expertise, and credibility aren't popularity contests.
A further problem with expertise is that it is highly domain-specific. The fallacy / error of ascribing general expertise to one who's shown mastery in one specific area is a grave one. Particularly when that one specific area is earning money / winning a lottery.
Oh: and based on my front-page scraping, I coded up some command-line tools to return activity / results for domains and members so I could do quick checks on both.
Note that what makes the front-page is an interesting function of both what the member posts and what the membership as a whole responds to. Someone might submit a large number of unappealing topics, but get traction on a much smaller subset. FWIW that seems to be my own experience. My tools will show the posts that got traction, but not the much larger low-vote long tail. Or flagged/killed posts, FWIW.
Eg for this thread the most recent comments can be found here: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
I built an Observable notebook to save me from having to manually construct those searches here: https://observablehq.com/@simonw/hacker-news-homepage