oh boy I wrote one of these many years ago for HN.
Within like an hour or two pg emailed me asking me to stop. I didn't know it at the time, but HN was being run on a rusty potato and scraping the homepage every 5 or 10 seconds was causing significant load.
Afaict, HN is still running on a rusty potato. The software's written well, so it doesn't need to run on more than that. (What's going to happen to it? Someone links to it from HN?)
> one of the more important and trafficked properties on the Internet.
I like HN, but it's really only important within a very niche subset of the Internet, and it also doesn't have much traffic. There's like a single post submitted every two minutes. That's not much.
(just making sure it's obvious - I didn't create this site! all credits to @jerbear4328 who is here on HN - I'll email them to let them know it's trending) :-)
Any tips on respectfully crawling HN so you don’t get throttled? I had an application idea that could not be served by the API (need karma values) so I started to write code to scrape but got rate limited pretty quickly.
{
"by" : "jkarneges",
"id" : 45533018,
"kids" : [ 45533616 ],
"parent" : 45532549,
"text" : "The HN/Firebase API doesn't make this easy. For <a href=\"https://hnstream.com\" rel=\"nofollow\">https://hnstream.com</a> I ended up crawling items to find the article.",
"time" : 1760043552,
"type" : "comment"
}
"parent" can either be the actual parent comment or the parent article, depending where in the comment chain you are.
I tried this, but it required making a request for every comment and would probably call for a backend, wheras this can run just off of the Firebase websocket stream on a static HTML file.
If you want a live version of most of the site (including tracking new comments in items you've already looked at), I made this to improve my React skills when the HN API first came out:
I just wish there was a dark mode.. Let me install night reader to see how well it looks :p
Edit: would've loved if there was a way to sign up or make comments from there, oh well :< I wish there was, I am not sure if that's possible tho but I hope it could be.
I checked the github repo and I found out that its a single static html page and ahh I forgot to see that its hosted on github.io as well because of their static hosting, somehow I didn't see it!!
Another great use of the HN API! It would be nice to filter it somehow to threads where I have commented, sometimes I don't notice until days later that someone replied to me.
Didn't know about this! Thanks, I just wish something like this was on signal or discord but maybe that's because I don't operate much mail and I really like signal...
But anyways, just as a headsup for a moment I was confused that I wasn't getting the verification mail and I was literally going to comment it but then I got the mail just in time, so maybe to anyone out there, maybe be a bit patient as its worth it (or it could totally just be an issue from my side as I spammed it twice or maybe it got congested I am not sure)
EDIT: It wasn't that they sent me 2 mails but rather that they sent me 1 mail and I clicked on it and then got another and I thought it was because of me spamming verification twice but it turns out that they sent me a mail because someone had responded to me :)
Yeah, you need to give it a moment. In active discussions I usually see the reply before getting the notification. I guess the service needs a bit to detect responses, and then each email need a bit of time as well.
I moderated a few chat rooms and dreamed of all this complex tooling I could build to automate and help me fix issues right away. Of course, the tiny chat rooms didn't need any of that. The hard part is actually moderating fairly and trying to figure out where the boundaries are. I couldn't do it. Makes sense why pg said "don't start a forum".
I know they get an alert whenever a comment is flagged. I assume they get an alert whenever certain "problematic" accounts post. Maybe they get an alert whenever the flamewar detector goes off? This forum is written in a Lisp so they can probably hotpatch whatever code they want and add whatever filters they want.
But dang has said numerous times that he doesn't read everything, that would probably not even be feasible.
The question is, YC being what it is, are they using LLMs to automate moderation or do sentiment analysis?
We're working on the latter but it's not yet in production. We're only going to deploy it once it's clear that it significantly improves on what mods + the existing software are already doing. The hard part, of course, is how to evaluate that.
Pretty cool! I must manually refresh to see new posts. Implementing real-time updates (e.x. via WebSocket or Server-Sent Events) would significantly improve usability.
It sounds cool, but for usability its not great. Think about how reddit recalculates the results order as you're paging through, and you see items from the previous page show up on the next. Now imagine that happening in realtime. Maybe there's a link you want to read, but you get pulled away for 10 minutes. By the time you get back the link is higher or lower, or may be completely missing.
That is a good point, however there are design solutions around it. For example:
- Poll every X seconds instead of real-time
- Enable user to toggle real-time mode
- Load new posts in with a "+" button at the top to fetch the latest posts (like Twitter)
It (hopefully) took exactly 30 seconds, the page delays every item until 30 seconds after its posted date. It doesn't poll HN's server, it opens a websocket to the official HN Firebase, and without the delay, items appear in large chunks. I'm pretty sure the HN server syncs with Firebase every 30 seconds, so this is as fast as it can go while still being accurate.
It works, but I'd advise about using the same branding. Consider making it significantly different than the website's comments page to avoid any issues, since it's not part of the webapp nor sanctioned.
Okay, maybe you could explain why copying the look of a product to look exactly like the said product, without it being warranted is a worth ignoring and funny?
a very very quick copy/paste of 73 comments there - fed to claude with a basic "characterize+count comments" give me 70% neutral/technical, 14% positive, 16% negative.
claude also said "The negative comments tend to be more substantive critiques of systems or research approaches rather than personal attacks." - so... that's good? :-)
I've done it before on large threads. Neutral sentiment tends to dominate at roughly 50% but negative sentiment is much higher than positive sentiment. Negative sentiment is also higher if you just take the highest ranked comments.
It's actually quite stark looking at it this way and makes me question how healthy it is to use this site as often as I do. Fully aware that we're also piling on the negative sentiment.
Within like an hour or two pg emailed me asking me to stop. I didn't know it at the time, but HN was being run on a rusty potato and scraping the homepage every 5 or 10 seconds was causing significant load.
reply