Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How does HN manage to be always online?
127 points by hacsky on June 21, 2022 | hide | past | favorite | 188 comments
How does HN manage to be always online?


According to @dang (https://news.ycombinator.com/item?id=28479595) via @sctb (https://news.ycombinator.com/item?id=16076041)

  We’re recently running two machines (master and standby) at M5 Hosting. All of HN runs on a single box, nothing exotic:
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (3500.07-MHz K8-class CPU)
  FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
  Mirrored SSDs for data, mirrored magnetic for logs (UFS)


Weird, I thought you needed 1024 Kubernetes nodes, a 70mb React bundle and 200 engineers to host 50M monthly sessions.


How do you host 50k monthly sessions per node?! That’s like 0.02 sessions per second.


I see what you did there!


Hire 200 engineers and they'll find a way to justify 1024 k8s nodes.


HN is a very simple application. Handling a high volume of traffic for a simple application is a very different problem from scaling a highly complex application.


HN is simple, yes. But it could be made more complicated. Personalized feed and data analytics are two complicated things that come to mind. Staying simple is often a choice, and it’s a choice not many companies make.


YCombinator doesn't need to run a Google Analytics script to have all the analytics they want.


What would make hn a simple application and reddit a highly complex application?


HN is a straight forward forum. Reddit is one level above that: generalized forums as a service.

Anything HN has had to implement, Reddit has to implement at a generalized, user-facing level, like mod tools.

Frankly, we underestimate how hard forums are, even simple ones. I learned this the hard way rebuilding a popular vBulletin forum into a bespoke forum system.

Every feature people expect from a forum turns into a fractal of smaller moving parts, consideration, and infinite polish. Letting users create and manage forums is an explosion of even more things that used to be simple private /admin tools.


Mod tools are not accessed and used by all users. So the load of mod-tools on the servers is probably negligible.

I agree, most software is deceivingly simple from the outside. Once you start building it, you become more humble about the effort required to build anything moderately complex.


Mod tools aren't used by the majority of users, correct. But the existence of mod tools does make the logic and assumptions of the application different. Now you've got a whole set of permissions and permissions checks, additional interfaces, more schema, etc.

Its not that the mod tools are constantly being used, its that there's now potentially far more code complexity for those tools to even exist.


User interaction, moderation, embedded media, a way more subreddits and different opinions they have, and so on.


is reddit really a complex application (regardless of how they build, scale, or deploy it)? Although that makes me wonder, what makes an application complex?


I'll start with a crude metric: Number of bubbles in the use-case diagram


Hiring 200 engineers


Because HN hasn't changed in forever, and behind the scenes the Reddit codebase is constantly evolving and growing.


Hacker News changes more often than people think, just not the layout because people here are weirdly fetishistic about it.

Since I've been here they've added vouching for banned users (and actually warning people beforehand) thread folding, Show HN, making the second chance pool public, thread hiding, the past page, various navigation links and the API. They've also been trying to get a mobile stylesheet to work. They've also mentioned making various changes for spam detection and performance. And the url now automatically loads a canonical version if it finds one, and the title is now automatically edited for brevity. And I've probably missed a few things.

And HN isn't a simple application by any means. Go look at the Arc Forum code - it isn't optimized for readability, or scalability or reliability, but joy - for the vibe of experimental academic hacking on a Lisp. It's made of brain farts. Hacker News is probably significantly more complex than that for being attached to a SV startup company and running 'business code' and whatnot.


I mean, that’s not really that much is it. And that’s the point, HN really doesn’t change much. Whereas Reddit, for better or for worse, has a much higher output of new user facing features.


> What would make hn a simple application and reddit a highly complex application?

Engineers.


Running ads, for one


Is the configuration stupid, or, is it somehow imperative that work is distributed over 200 local engineers + over 70MB of externalities?


When a VC gives you a giant boatload of money, they insist you "scale up" the company overnight. So you go on a massive hiring spree, and get triple-digit team of engineers before having any market traction.

And they're tasked with building a product that can handle Google-levels of demand, though they currently only have two customers, neither of them paying.

It indeed is imperative, but not for technical reasons.


And then when the stock market drops by .1%, you lay off 30% of the workforce because that's what's needed for its survival.


I would take the money then do none of that. And now I got a 5 years runway, enough time to build a product people like and use, and by then the investors won't be angry anymore.


You would never get the money with this plan.


If the money comes with a stipulation that I have to spend it all in such a way that I screw myself and my company over, then I don’t want the money.


And is it because those strawman VCs are all stupid, or, is it somehow ...


They do that so you're screwed later on without them when that first bit of money starts to run out and boom they own your company


HN is not user friendly. Better comparison is Stack Exchange which is way more rich and runs on small (relative) infra.


HN is perhaps the most user friendly site I go to with regularity.

The idea that a website needs to be “rich” to be usable is one of the dumbest things the industry has convinced itself of in the last 20 years (following only ‘yaml is a smart way to encode infrastructure’).


To be fair, it's not as much user-friendly as it is simple, and simple tends to be easier to understand.

For example, if it was more user-friendly, it could have links to jump between root comments, because right now very popular top comments tend to accumulate most interactions, and scrolling down several pages to find the next root thread requires effort.


Is that not what the "prev"/"next" links do?


User on this account since 2010, 12K karma, and I just learned what next does.

TY!


Duck me sideways, they were always there, but I was blind.


Compulsive over-engineering is by no means an IT-centric problem.

It takes substantial wisdom to arrive at an 80% solution and cease fannying about.


And the people that brings this wisdom also brings few or no metrics that are appreciated by management.


The people who push the other direction also bring few or no metrics. I.e. there is often no reason to add <bag of features>, except a customer (who didn't buy the product yet) mentioned them as nice to have during initial sales talks.


I prefer YAML to JSON for our infra. I know some people do not like the whitespace.

What do you prefer?


JSON:

* doesn't encode Norway to false;

* most formatters for JSON are deterministic.

* doesn't deserialize into arbitrary objects;

YAML, in in constrast...

* YAML is insecure by default and will deserialize into arbitrary objects;

* YAML knows that there's no such thing as wall clock time, there's only number of seconds since midnight;

* YAML has 22 ways of writing true or false, and the parser will silently replace your "strings" with false.

* There are 63 ways of writing multi-line strings;

* A truncated YAML file is still a "valid" YAML file.

https://noyaml.com/


IMO the solution to YAML-as-config is a strict subset of YAML.

JSON is one strict subset, but one that makes smart trade-offs for strictness and machines like error detection and syntax-typed types.

We decided on a different subset of YAML for our users that were modifying config by hand (even more strict than StrictYAML). Some of the biggest features of YAML are that there is no syntax typing, and collection syntax is simple (e.g. also true for JSON, false for TOML).

For example, a string and a number look the same. This seems bad to us developers at first, but the user doesn't have to waste 20 min chasing down an unmatched quote when modifying config in a <textarea>. Beyond that, it's the same amount of work as making sure the JSON is `"age": 20` instead of `"age": "20"`, one just has noisier syntax.

I think the StrictYAML docs have a great breakdown of the advantages: https://hitchdev.com/strictyaml/why-not/

We decided against TOML because nesting is too confusing. https://github.com/toml-lang/toml/issues/846


Declarative code. CSS is better than YAML for describing a desired state.


Not on mobile.


Thanks for the downvotes everyone, else wouldn't have even known there were so many replies to my comment.


>Stack Exchange which is way more rich and runs on small (relative) infra.

Yes, I've heard that SO runs on relatively simple and modest infra. And agree that would be a good example.

>HN is not user friendly

How so? I find the HN UX a refreshingly simple and effective experience. It might not have all the bells and whistles of newer discussions fora, but it doesn't obviously need them. I'd say it's a good example of form/function well suited to need. Not perfect perhaps, but very effective.

YMMV of course.


Try loading it on a 2G (2 bars = 128kbits per second — those are bits not bytes) connection. It loads almost instantly with no fuss. Now try loading virtually any site on the same, if it ever loads at all without timing out, you’ll be waiting over 10 minutes.


There was a YT preso from several years back where the StackExchange founder explained how it ran off just ~10 servers, and could run on half that many if needed. He stressed the simplicity of their arch, and that their problem space was massively cachable, so the servers just had a few hundred GB of ram, and only had to do work to rerender pages, but could store them in cache most of the time. It was a C#.Net app.

So, I think there is a lot more in common than you think between HN and SO.


What about HN is not user friendly? I think it's a breath of fresh (stale?) air.


My pet peeves: No dark mode, sorely lacking for me for reading in the dark, then there is no indication at all that you've got replies (at least a tiny number next to threads perhaps?) and the up/downvote buttons are too small to reliably tap on mobile. Oh, and enumeration support would be fantastic, the workarounds tend to be hard to read.

Other than that, I think it's delightfully ugly and lightweight.


I use the Dark Reader extension for Firefox; HN looks fine under that.

Having to separately configure individual sites or web apps for dark mode is a nonstarter anyway; if you could do that, would you really want to?

Ideally, you should be able to set your device to dark mode, and everything would follow: every app, every site in the browser.

Some combination of setting your OS to dark mode and using a dark mode extension in the browser sort of approximates that, imperfectly.


No need to set it per individual page. There are (arguably easy to use) ways for a web page to know the user's OS-level color scheme preference [0].

We still need the workaround via extensions or Userstyles for the ones that don't implement that, sadly.

[0] https://developer.mozilla.org/en-US/docs/Web/CSS/@media/pref...


You may not be down for an app/mobile experience, but Harmonic is beautiful and has a dark mode


I can't seem to find Harmonic in the iOS App Store, is it Android-only?

Also, HN apps tend to make it harder to send interesting things to Roam or the laptop or Safari's reading list, the website makes that really convenient.


Thanks for the recommendation, just switched to it!


I believe the internet term is dank air.


Yeah! I really miss all those ads (not!)


I agree HN could be improved with small CSS changes, but no backend change would be required.


Well the good thing with CSS is that you can override it with your own stuff locally if you wish to


Tough to do when the entire layout is built with nested tables, like it's still 1999.


Tougher to do on mobile though


How does anyone use anything aside from Materialistic?!


I wouldn’t say it’s not user friendly but I understand where you are coming from. I also missed some more modern features/looks and decided to build my own open source client [0]. Feel free to give it a go to see if it’s more your taste!

0. https://modernorange.io



I love Hacker News. It is very friendly to my phone.


By rich do you mean popping up a captcha every time I search for something?


No microservices on top of kubernetes? no SPA with SSR? You are doing it wrong.

I'm gonna write an alternative which will be WebScale.

j/k of course.


Well it is SSR tbh :)


With or without caching?


I wonder if they use something like CARP[^1] for redundancy. Also, strikes as odd that they didn't go with ZFS for storage, makes FS management _way_ easier for engineers who don't spent all their on these kind of operations.

[^1]: https://www.freebsd.org/cgi/man.cgi?query=carp&sektion=4


You might ask what sort of filesystem maintenance they ever need to do. Replacing a disk is covered by the mirror. Backup is straightforward. The second system covers a lot more. If they need to increase hardware capacity, they can build new systems, copy in the background, and swap over with a few minutes of downtime.


Curious how much memory usage it sits at on average.


Love to see FreeBSD getting some love.


(beginner question) How do they store the data? is an SQL db on overkill for such a use case? what would be the alternative? an ad-hoc filesystem based solution? then how do the two servers share the db? and is there redundancy at the db level? is it replicated somehow?


"ad-hoc filesystem based solution" is the closest of your definitions, I think. Last time I saw/heard, HN was built in Arc, a Lisp dialect, and use(s/d) a variant of this (mirrored) code: https://github.com/wting/hackernews

Check out around this area of the code to see how simple it is. All just files and directories: https://github.com/wting/hackernews/blob/master/news.arc#L16... .. the beauty of this simple approach is a lack of moving parts, and it's easy to slap Redis on top if you need caching or something.

There is a modern maintained variant at https://github.com/arclanguage/anarki/tree/master/apps/news as well if you want to spin up your own HN-a-like and have the patience.

File syncing between machines is pretty much an easily solved problem. I don't know how they do it, but it could be something like https://syncthing.net/ or even some scripting with `rsync`. Heck, a cronned `tar | gzip | scp` might even be enough for an app whose data isn't exactly mission critical.


Wow, I had no idea HN was built like that - I'm impressed. I really wish I could read the Arc code better though since I'd love to know more about the details of how data is represented on disk and when things move in and out of memory, etc.

Does anyone know of other open source applications with similar architectures like this?


>Does anyone know of other open source applications with similar architectures like this?

There's a good reason everyone else just uses a relational database, and it isn't because everyone else is addicted to unnecessary complexity.


> and it's easy to slap Redis on top if you need caching

With filesystem as the storage you don't even need Redis, OS would cache the most recent files anyway.


Data is stored in flat text files containing Arc Lisp tables, or in RAM. There is no 'database' per se, unless they've added one and not mentioned it.

You can get the software and language HN is based on here: http://arclanguage.org


I think the link is broken, it's not HTTPS


Force of habit, I fixed it.


I love the design similarity to HN.


That’s because HN is just about the only thing written in Arc, and everything else you see is a fork of an earlier version of HN.


A single bare metal server is more reliable than most people think it is. Complexity adds a lot of overhead and layer after layer that could possibly fail.


A single server is much faster than most people think, too!

In the microservice or serverless arrangements I've seen, data is scattered across the cloud.

It's common for the dominant factor in performance to be data locality, most times this talk about data locality is about avoiding trips to RAM, or worse disk. But in our "modern" distributed cloud things, finding a bit of data frequently involves a trip over the network. In the monolith world what was once invoking a method on an account object, has become making a HTTP POST to the accounts microservice.

What might have been a microseconds operation in the single server world, might become hundreds of milliseconds in the distributed cloud world. While you can't horizontally scale a single server, your 1000x head start in performance might delay scaling issues for a very long time.

A most excellent paper related to this topic that I think should be mandatory reading before allowing anyone an AWS account is http://www.frankmcsherry.org/assets/COST.pdf :)


When you give half the effort to set things up properly, a single server can handle a lot of load and traffic, and get a lot of things done.

If you know some details of the services you're going to host on that hardware, the things you can do while saving a lot of resources is considered as black magic by many people who only deploys microservices to K8S systems.

...and you don't need VMs, containers, K8S and anything.


What are those details?


It generally boils down to three things: 1) How much resource a service need to run well, 2) How much the service wants to consume if left unchecked, 3) How performant you want your service to be.

After understanding these parameters, you can limit the resources of your application by running it under a cgroup. Doing this won't allow a service to surpass the limits you've put onto it, and cgroup will pressure your service when it nears its resource limits.

Also, sharing resources is good. Instead of having 10 web server containers, you can host all of them under a single one with virtual hosts, most of the time. This allows good resource sharing and doing more with less processes.

On the extreme case, I'm running a home server (DNS server, torrent client, synching client, a small HTTP server and an FTP server with some other services) under a 512MB OrangePi Zero. The guy works well, and never locks up. It has plenty of free RAM, and none of the services are choking.


I agree but at the same time: Inter-process communication is also faster when a process is allowed to write to or read from another process's memory. Doesn't make it a good idea, though.


The way I deploy them doesn't mean "compromise one, compromise all". For example, I generally leave SELinux intact. So they're properly isolated.


Yeah, nowadays you could just scale-up and still have a lot of leeway. A high-end server (w/ redundancy maybe) is more than enough for 95% of all common startup use cases.

Distributed computing only makes sense when you're starting to deal with millions of daily users.


Yes. HN works well with one dual socket server (2 x 4)

A new server could have 4 × 64.

Also, to distribute the load you can use an ‘A’ DNS record per server:

https://blog.uidrafter.com/freebsd-jails-network-setup


To be fair you can run a microservices stack on a single server and it will be very fast, especially if you use gRPC instead of HTTP


Would grpc make a huge difference if requests took a second each?


gRPC being an RPC, it avoids the overload of HTTP by miles, achieving low latency as well.


I think the main risk of the HN architecture is that (I believe) it's a single datacentre in a single location. Hopefully they have offsite backups.

The other risk I guess would be that all the NS are AWS/route53 severs, so if that went down they'd be up for about a minutes (looks like the DNS TTL is 2 minutes).

You could host your own NS servers in two different locations on two different providers, you could have a third hot spare server ready to go too, that would allow the service to survive an earthquake flattening San Diego (off site server ready to go), and cope with the loss of AWS DNS. Whether the cost/benefit ratio is there is another matter. I think the serving side of route53 is fairly reliable though (even when the interface to update records fails on a frequent basis), and the cost of being down isn't particularly terrible.


> I think the main risk of the HN architecture is that (I believe) it's a single datacentre in a single location.

Exactly, Amazon US EAST makes the same point several times a year! :)


We should ask this question to ourselves: As a user, if I have to manage the risk of US-East-1 going down, may be we should question this entire Cloud thing. It was marketed as "100% bulletproof" in 2005 when we discarded on-prem.


To be fair to amazon I don't think they've had an outage that affected different data centres at the same time, but yes I would far rather have my main and backup on two providers in two appropriately diverse locations.


but never forget that AWS uses us-east-1 to host core services that, when they go down, they take almost the whole thing down, no matter which region.


Which specific services?

I only use VMs and route53, and form what I've seen when us-east-1 is down I can still resolve my domains


What is the downside for a user to have a few minutes of downtime?

What is the upside for HN to add this engineering complexity to avoid a couple of minutes of downtime?


For HN it probably doesn't matter, although what's the value

For Amazon too it probably doesn't matter -- if I occasionally can't buy something I'm almost certain to simply try again 30 minutes later

For some services a couple of minutes of downtime isn't good enough. Imagine if the superbowl went dark for 5 seconds just as the winning play happened.


> A single bare metal server is more reliable than most people think it is.

Well, two, presumably. Otherwise you can't reboot without taking the website offline.


Sometimes that's fine, we can go without HN for five minutes once every few months.


Few months? I notice HN down for a few minutes about once a week.


If you want it to be online, why turn it off or reboot in the first place? You can run updates without rebooting, at least on Linux.


Can't replace the kernel though, and updating the web server application itself is going to be hard too. Unless it's some Erlang like hotpluggable server I guess.


> Can't replace the kernel though

Yes you can, since 2008. https://en.wikipedia.org/wiki/Ksplice


Obligatory Lamport quote: "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."


We managed to run a successful bootcamp school LMS on a single cheapest 1gb ram hetzner vps with rails app, redis cache, postgres, backups, staging env and near zero production issues.

Not hi loaded of course, but still hundreds active users every single day interacting with the app.

Recently had to upgrade to the next tier because of growth.

Modern servers are super fast and reliable as long as you know what you’re doing and don’t waste it on unnecessary overheads like k8s etc.


Adding to this: If you use dedicated servers rather than VPS, you can also squeeze out a lot more performance of the boxes as you have dedicated CPUs just for you, instead of shared (vCPU) that others use too.


AWS and GCP vCPU are absolutely not shared, and any provider that’s still sharing cores or threads is leaving its users open to all manner of atrocious side channels.


vCPU is a "Virtual Core", it's even in the name. When you rent those, you don't get access to a dedicated core/CPU, but one part of a logical processor. The side channel attacks you mention all have mitigations at this point.


Only the very smallest instances don’t get a full core. And when you have multiple vCPU the scheduler _won’t_ ever split you onto a thread. This is definitely true of aws’s nitro, I don’t know for certain about gcp. The attacks are not fully mitigated between hyperthreads.

Is there evidence cloud compute is more than a few percent slower than on-prem?


> Only the very smallest instances don’t get a full core.

Not sure this is true for AWS anymore.

Even a t3.nano gives you 2 vCPUs, which I've always interpreted as meaning they give you 1 core with two threads to ensure that you never share a core with another tenant.

Of course, t2 instances still exist which give 1 vCPU, but there's no reason to create one of those when a t3 gives 2 vCPUs at a lower price. The only reason to be running a t2 is if you created it before t3 existed and just haven't bothered migrating.


A vCPU also represents a CPU _thread_ which is usually half a core.

Sometimes you will share the CPU.


Even k8s only costs about 10% overhead.

It's kind of incredible that "a few hetzner dedis running k8s" still has better reliability than the Cloud™ does.


Remove Kubernetes and "A few Hetzner dedicated servers running just your application" has even better reliability as you remove more points of failures than "A few Hetzner dedicated servers running Kubernetes".


So much this. System failure is a statistics game. Which is why I like the Erlang approach of having isolated state that you assume will corrupt after a while and you can restart that part of your application with a fresh state.

K8s does this, kind of, on a higher level, where the downtime is often more noticeable because there's more to restart. But if your application is already highly fault-tolerant, this is just another point of failure.


Why hasn't most of the web moved to Erlang (or erlang like) architecture, actually distributed?


Rant:

Because people adore stable income with no risks, and because most programmers out there are hacks who will only ever learn one or two ancient languages that lost their competitive advantage 10 years ago.

Take a look at how insanely far you can go with Elixir + Phoenix on a server with 2 vCPUs and 2GB RAM. It's crazy. At least 80% of all web apps ever written will never get to the point where an upgrade would be needed.

So yeah, people love inertia and to parrot the same broken premise quotes like "But StackOverflow shows languages X and Y dominate the industry!".

Nah, it doesn't show that at all. It only shows that programmers are people like all others: going out of the comfortable zone is scary so our real industry is manufacturing bullshit post-hoc rationalizations to conceal the fact that we're scared of change. A mega-productive and efficient change at that.

Social reasons > objective technical reasons. Always has been true.

/rant


I completely agree. Most developers simply copy whatever is currently popular without taking into consideration their specific context and biz requirements. Almost all software solutions are massively complexed, with many points of failure, and high latencies for every single transaction, leading to a death spiral of even more complex and expensive “solutions” for “performance” reasons. Microservices is the poster child for this. In contrast, a single high-performance server can handle the full workload of 99.99% of web applications out there. With a separate warm stand-by server ready to take over if needed, and a linear event transaction log as the source of truth.


Because (despite the hype) Erlang embeds principles that can be implemented in other languages and they solve only a narrow set of problems in distributed systems. This does not justify a dedicated programming language.


Is Erlang particularly hyped? I don't usually see it suggested as a solution, I see it almost exclusively in reference to something already written in it, a fait accompli, like WhatsApp.


WhatsApp was scaling really well with a really small team, a good testament to the power of the language.

Of course you can take over some of these concepts, but I don't know any other language that is so well designed around the actor model.


While Kubernetes definitely has its own failure modes and can be an utter pain in the ass to set up, in my experience it also removes significant failure modes and makes your life way easier.

For me, the most important thing is I don't have to take care in my deployment pipelines about how many servers I have and how they are named/reachable, to stop Docker containers, to re-create Docker containers, how to deal with logs, letsencrypt cert renewals... all I have to do is point kubectl at the cluster's master, submit the current specification on how the system should look like and I don't have to deal with anything else. If a server or a workload crashes, I don't have to set up monitoring to detect and fail over, Kubernetes will automatically take care of that for me. When I add capacity to the cluster for whatever reason, I enroll the new server with kubeadm and it's online - no entering of new backend IPs in reverse proxies, no bullshit. Maintenance (e.g. OS upgrades, hardware maintenance) becomes a breeze as well: drain the node, do the maintenance, start the node, everything works automatically again.


As someone who does this day to day, I think k8s is worth it for anything non-trivial. If I didn't already have this understanding, then it might be a different trade off.


Hetzner servers running your code is The Cloud™


Hetzner offers bare metal as well.

https://www.hetzner.com/sb?hdd_from=0&hdd_to=15500


The Cloud is just someone else's computer


> It's kind of incredible

It's not incredible, it's normal - it's been always like this.


I meant it more as in "it's kind of incredible how bad AWS/Azure/GCP is" in terms of reliability and cost ;)


"Unnecessary" in what context?


If you think Kubernetes is unnecessary overhead, you've never operated at significant scale.


Actually i have experience working un hi loaded env operated by k8s.

But for 99% of projects i see, it’s a waste of time and resources (mostly people resources, not just cpu). HN is a perfect example of a project that doesn’t need it, no matter the traffic.

If you need some additional flexibility and scalability over “bare metal” setup, you can go far with just docker compose or swarm until you have no choice but use k8s

Again, if you know what you are doing.


I worked in an academic library where the scale was on the order of hundreds of users per day, but many of those users were faculty/researchers spread out across the globe who got very grumpy when whatever online archive/exhibit was not available.

I migrated our services from a very “pet” oriented architecture to a 4-node Proxmox cluster with Docker Swarm deploying containers across four VMs and it worked great. Services we brought up on this infra still to this date have 100% uptime, through updates and server reboots and other events that formerly would have taken sites offline temporarily.

I looked at k8s briefly and it seemed like total overkill. I probably would have spent weeks or months learning k8s for no appreciable advantage over swarm.


And how many k8s believers ever reach significant scale? It's just like back when NoSQL was the trendy thing and people thought a couple gigabytes meant "big data". Mostly it’s simply cargo culting.


Brilliant rephrasing of my point.


Not at all, k8s is not designed for very large scale. Unsurprisingly, FAANGS don't use it to manage their own platforms.

Edit: Google's Borg is a very different beast.

Edit: no need to patronize me. I worked on massive scale deployments otherwise I would not be commenting.


Many large organisations use Kubernetes (Google, Spotify, IBM, etc). Regardless, large scale and very large scale are different. Kubernetes is well suited for controlling fleets of compute resource in the order of 10,000s CPU cores, and terabytes of memory.

The compute overhead to orchestrate these clusters is well worth the simplicity/standardisation/auto-scaling that comes with Kubernetes. Many people have never had to operate VMs in the hundreds or thousands and do not understand the challenges that come with maintaining varied workloads, alongside capacity planning at that scale.


a million nodes running a single application is scale, but a thousand nodes running a thousand applications is also scale, and they are very different beasts.

The FAANGs operate the first kind, k8s is mostly aimed at the second kind scale, so its designed "for scale", for some definitions of scale.


K8s spun off of Google’s Borg operator software specifically designed for high availability at FAANG scale. So essentially K8s is the “community edition.” Go read the Google SRE Book for context.

We use it to serve ruby with 50 million requests per minute just fine. And the best part is the Horizontal Pod Autoscaler which saves our ass during seasonal spikes.

While serverless/lambda are great I do think K8s is the most flexible way to serve rapidly changing containerized workload at scale.


Kubernetes is fantastic, though I think of it more as a tool for managing organizational complexity than ops. You can vertically scale beyond the level needed by most commercial applications nowadays using ad-hoc approaches on a machine or two with 16+ cores, 256GB+ RAM, terabytes of NVM drives, etc. but many (even small) companies have policies, teams, and organizational challenges that demand a lot of the structure and standardization tools like Kubernetes bring to the table.

So I'm not disagreeing with your assertion, but I'd perhaps scope it to saying it's useful overhead at significant organizational scale, but you can certainly operate at a significant technical scale without such things.. and that can be quite fun if you have the team or app for it :)


Kubernetes is great at significant scale, that's what it's designed for. It has significant overhead if you don't need that scale.


Exactly. And probably not at significant complexity either.


I guess because of sanity and simplicity in its architecture?

I once wrote a software in Rust, a simple API, one binary in one DigitalOcean instance started by systemd, and nothing else. The things has been working nonstop for years making it the most stable piece of long running software I’ve ever written, and I think it all comes from it being simple without any extra/unnecessary complexity added.

I’m not bragging btw, I actually had to contact the user years after I wrote that because I couldn’t believe that the thing was still working but I hadn’t heard from them in years!


There are micro-outages somewhat frequently; I don’t mean that as a criticism but merely as an observation.


An interesting note would be that multiple 9s of uptime isn't actually necessary to keep users happy on a site like HN. I used to use sites in the early 2000s that were down a lot more regularly than HN but it didn't put me off using them.


I sometimes wonder if shutting down a website [say] with exception of Mondays or between 22:00 and 8:00 would add to the experience.


A minor spoiler I guess, but that's what they decided to do at the end of Ready Player One (the film, not sure about the book).


There are government and personal websites like this!

https://freakonomics.com/2012/08/this-website-only-open-duri...

I hope another elder will drop others here.


B&H Photo and Adoroma stores are closed and websites do not take orders during Shabat and on Jewish holidays where commerce is prohibited.


Whose 22-08? Whose Monday? For anything other than very local groups, I don't think that would be a good idea.


I sometimes vote and click "Reply" with very small delay. This prompts a message that they can't serve requests that fast.

Other than that I have seen no downtime.


That is just prudent overload protection (enforce certain time between actions) I think, and not that they are actually unable to handle it.


Indeed, "downtime" was a bad word.


Yeah, I infrequently get a “too many requests sorry!” backpressure error. Sometimes my votes get lost to this?

And once in a blue moon, HN gets hammered by unusual traffic and becomes somewhat unusable while logged in.


I'm reading through hckrnews.com, as I'm sure others have their favorite readers.

Those function as proxies and lower the perception of downtime as well.

I've never experienced the website being inaccessible even for short periods.


It IS down occasionally: https://twitter.com/hnstatus

Not all "downs" are reflected there. Last time I remember having really bad perf or non-workable, don't remember - but opening in incognito, you would get cached results fast.


HN is a mature product! Most common software failures are from a deployment of changes, I don't think HN is being deployed to every day

We're all used to effectively beta software that's constantly being updated every day and never final


Maybe they are intentionally using boring tech, not using Cloudflare, AWS, etc and are self hosting somewhere, hopefully.


> Maybe they are intentionally using boring tech

This might be true on the infrastructure layer, but HN definitly uses "fancy" technology as Paul developed his own lisp that powers HN, Arc :) http://arclanguage.org/

> Arc is designed for exploratory programming: the kind where you decide what to write by writing it. A good medium for exploratory programming is one that makes programs brief and malleable, so that's what we've aimed for. This is a medium for sketching software.


most of the outages usually happen when there are changes. hn has no changes.


Related question: How is HN funded? I presume through YC.


Probably smaller than the snack budget for YC


They read file, save data to file. That's it.


Hopefully there’s an fsync as well.


Life is too short to wait for a fsync.


There are some environments where making sure your transaction is secure on non volatile storage is critically important. And then there is storing comments on a news aggregation site.

Hell, this is one environment where I bet you could get away with a full async mount.


Sure, but the default should be erring on the side of persistence and consistency. The throughput for a site like HN of actual writes is next to nil. It’s short text only comments averaging a couple hundred bytes and you only have to pay for the fsync once anyway. If you have to turn off fsync to make it perform then you’re doing something horribly wrong.


I would love to know tech stack and architecture of HN. And how it evolved (if it did). And what resources they spend (money, manhours etc) to maintain it.


It's written in arc, a custom dialect of lisp that Paul Graham made.


http://arclanguage.org/forum

Does look familiar with interface?


It does look a lot like HN, although slightly different in colors, font family, and nav bar.


Last time I remember it being down for anything other than a few minutes was back in 2014. https://twitter.com/Coding2Learn/status/420298797462593536


I have seen an error message once, instead of content. I did not remember it but the wording was amazingly perfect. And there is one minor issue when "reply" button to a specific comment had not been loaded by the time the comment has rendered.


> always online?

It depends on what you mean by "online" and the service level :

- "online" meaning a HN server responds with something. In this more literal sense, HN always seems to be up.

- "online" meaning normal page load response times. In this sense, HN sometimes times out with "sorry we can't serve your request right now". That seems to happen once a week or once a month. Another example is a super popular thread (e.g. "Trump wins election") that hammers the server and threads take a minute or more to load. This prompts dang to write a comment in the thread asking people to "log out" if they're not posting. This reduces load on the server as rendering pages of signed-out users don't need to query individual user stats, voting counts, hidden comments, etc. This would be a form of adhoc community behavior cooperating to reduce workload on the server rather than spin up extra instances on AWS.

The occasional disruptions to the 2nd meaning of "online" is ok since this is a discussion forum and nobody's revenue is dependent on it. Therefore, it doesn't "need" more uptime than it already has.


Powered by Illuminati technology.


>Powered by Illuminati technology.

PG operating the switchboard.


I'd guess that HN has nearly zero dependencies, other than the bare metal box we've been told about. That makes things a lot more simple!


HN is down and has loading issues pretty often. AFAIK they now run behind a CDN so if you're not logged in you won't notice this.


There was a post somewhere else in this thread that said they use Nginx caching for non-logged in users, so not even a CDN it seems.


I've literally never experienced this.


There have been times when we have been asked if we can log out if we won’t be commenting because this will then allow the servers to serve straight from cache. I have only seen that happen a few times though.


I have never experienced HN being down.


Guess they don’t use Cloudflare:

https://news.ycombinator.com/item?id=31820635


news.yc might not be, but ycombinator.com and startupschool.org are on Cloudflare.


0 features = 0 problems




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: