Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I cannot overstate the performance improvement of deploying onto bare metal. We typically see a doubling of performance, as well as extremely predictable baseline performance.

This is down to several things:

- Latency - having your own local network, rather than sharing some larger datacenter network fabric, gives around of order of magnitude reduced latency

- Caches – right-sizing a deployment for the underlying hardware, and so actually allowing a modern CPU to do its job, makes a huge difference

- Disk IO – Dedicated NVMe access is _fast_.

And with it comes a whole bunch of other benefits:

- Auto-scalers becomes less important, partly because you have 10x the hardware for the same price, partly because everything runs 2x the speed anyway, and partly because you have a fixed pool of hardware. This makes the whole system more stable and easier to reason about.

- No more sweating the S3 costs. Put a 15TB NVMe drive in each server and run your own MinIO/Garage cluster (alongside your other workloads). We're doing about 20GiB/s sustained on a 10 node cluster, 50k API calls per second (on S3 that is $20-$250 _per second_ on API calls!).

- You get the same bill every month.

- UPDATE: more benefits - cheap fast storage, run huge Postgresql instances at minimal cost, less engineering time spend working around hardware limitations and cloud vagaries.

And, if chose to invest in the above, it all costs 10x less than AWS.

Pitch: If you don't want to do this yourself, then we'll do it for you for half the price of AWS (and we'll be your DevOps team too):

https://lithus.eu

Email: adam@ above domain



Yup, I hope to god we are moving past the age of 'everything's fast if you have enough machines' and 'money is not real' era of software development.

I remember the point in my career when I moved from a cranky old .NET company, where we handled millions of users from a single cabinent's worth of beefy servers, to a cloud based shop where we used every cloud buzzword tech under the sun (but mainly everything was containerized node microservices).

I shudder thinking back to the eldritch horrors I saw on the cloud billing side, and the funny thing is, we were constantly fighting performance problems.


Tangential point but why is it that so often these leaving the cloud posts use the word "beefy" to describe the servers? It's always you don't need cloud because beefy servers handle pretty much any bla bla bla

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

If anyone from oxide computer or similar is reading, maybe you should rebrand to BEEFY server inc...


>Tangential point... rebrand to BEEFY server

idea for an ad campaign:

"mmm, beefy!" https://www.youtube.com/watch?v=j6ImwKMRq98&t=21s

i don't know how "worldwide" is the distribution of Chef "Boyardee" canned spaghetti (which today is not too good), but the founder's story is fairly interesting. He was a real Italian immigrant chef named Ettore Boiardi and he gets a lot of credit for originally evangelizing and popularizing "spaghetti" and "spaghetti sauce" in America when most people had never tried it.

https://en.wikipedia.org/wiki/Ettore_Boiardi

you know, "spaghetti code..."?


Noted!


Because the server types you get for the price of a single Heroku dyno are incredibly beefy. And suddenly you need a lot less dynos. Which is quite important if you start managing them yourself.


The servers are always beefy and the software is always blazingly fast. Blazingly beefy is my new joke trademark.


Because that is a casual word in the English language to describe an object with substantial power?

If you would suggest a word that would make a better substitute in this case, that could move the conversation forward, and perhaps you could improve the aesthetic quality of posts about leaving the cloud.


Well, I wouldn't say I have a beefy car or a beefy sword: there's some historical bit of linguistic connection that seems to have caused people to describe a server with the adjective "beefy", rather than "powerful", "hefty", "stacked", or... "chonky" ;P. Coming up with some options, I think my favorite might be "cranked"?


The first definition for beefy from Merriam-Webster is: heavily and powerfully built .

As someone who's had to rack some pretty heavy servers with lots of GPUs, CPUs, NVMe drives, and RAM, that turned out to be quite powerful, I'd say "beefy" is accurate.

https://www.merriam-webster.com/dictionary/beefy

Side note edit: hefty and chonky don't tell me much about power. Stacked is usually more about a team or group of things (and for individuals, I'd rather not see it).


I wouldn’t describe a corvette as beefy but I might call its engine beefy and i would definitely call a diesel truck beefy.

I think servers, especially bare metal, are in the category of grunt and raw power. Beefy feels right.


My conspiracy theory is that "cloud scaling" was entirely driven by people who grew up watching sites get slash dotted and thought it was the absolute most important thing in the world that you can quickly scale up to infinity billion requests/second.


No, cloud adoption was driven by teams having to wait 2 years for capex for their hardware purchase and then getting a quarter of what they asked for. You couldn't get things, people hoarded servers they pretended to be using because when they did need something they couldn't get it. Management just wouldn't approve budgets so you were stuck using too little hardware.

On the cloud it takes five seconds to get a new machine I can ssh into and I don't have to ask anyone for the budget.

You can save a lot of money with scaling, you have to actually do that though and very few places do.


I think this is partially that for the past decade or two, on-prem was partially preferred by very frugal companies.

One of the places I worked that was on-prem enforced a "standard hardware profile" where the servers were all nearly the same except things that could be changed in house (like RAM sticks). When they ordered hardware, they'd order like 5% or 10% more than we thought we'd need to keep an "emergency pool".

If you ended up needing more hardware than you thought and could justify why you needed it right now, they'd dip into that pool to give you more hardware on a pretty rapid schedule.

It cost slightly more, but was invaluable. Need double the hardware for a week to do a migration? No problem. Product more popular than you thought? No problem.


>I think this is partially that for the past decade or two, on-prem was partially preferred by very frugal companies.

Sure this is made worse by frugality, but I experienced this problem when virtualization was in its infancy, much less cloud anything even existing much less being popular.


> On the cloud it takes five seconds to get a new machine I can ssh into and I don't have to ask anyone for the budget.

This isn't exactly realistic, not for too long anyway.

Once your cloud bill climbs into the millions, expect to see just as much scrutiny on what's costing so much and what can be cut and can you really justify the new thing you want to spin up (as there should be).

Having been through many growing startups, I'd say the freewheeling days of spin up whatever you want only last to about 50K to 100K/month AWS billings.


This. Instant availability of compute and storage, purchasable with a corporate credit card, was the cloud killer app.


I can't have instant ressources, azure Switzerland is overbooked and can't create much ressources, at two different clients. And I read something similar for AWS.

And in most "large" companies, you still need to go to different teams and processes to have those cloud ressources.

Oh and if your company made the mistake to outsource their IT, odds that you'll have a 4 digit bills to change a teraform.

Welcome to Western Europe.


There's "cloud" as "server in the cloud I can use" which is what the majority of smaller players are using - it's just someone else's server.

There's also "cloud" as the API-driven world of managed services that drain your wallet faster than you can blink.


Seems like a few negotiation skills would be of better use than doing extreme amounts of work so somebody can take months on approving new hardware. Well guess what, the people that did slow down hardware procurement are now slowing down deployment of cloud resources as well because the fundamental problem wasn't addressed and that's organizational misalignment and disfunction.


Even if you can get instant approval, Dell or HP won't Fedex you 50 servers next day like they will 50 laptops.

And once you're a customer you get to deal with sales channels full of quotes and useless meetings and specialists. You can't just order from the website.


You can get servers elsewhere as well and they can be more agile than Dell/ HPE/ SuperMicro/ Lenovo. Asus, Asrock, Gigabyte and some other brands can ship servers rather quickly. Perhaps not the next day but the prices are much lower out of the door especially if you are a smaller customer. Most companies know well in advance how much compute/ storage they will need. So if something takes a month or two it isn't a big issue.

You're going to negotiate with the structure of an entire corporation?

Excuse me CEO your budgeting process is inconvenient to my work, please change it all to be more responsive.

This is not how things work and not how changes are made. Unless you get into the C-suite or at least become a direct report of someone who is nobody cares and you're told to just deal with it. You can't negotiate because you're four levels of management away from being invited to the table.

An organization that can make agile capital expenditures in response to the needs of an individual contributor is either barely out of the founder's garage or a magical fairyland unicorn.


You would be surprised. I have unclogged processes inside and between companies before where people would tell me it's not possible or that success is unlikely.

And yes, if you are working with professionals most expenditures can be planned well in advance so you definitely can take months or even more than a year in the process. If there is a major issue affecting the business of the company you would be surprised what is possible. I've got things approved in weeks in an enterprise of 2000+ employees before and I was 5 levels down from the CEO then. I have improved my negotiation skills greatly since then (it helps when you co-found a consulting company).


And now, on cloud it’s the same but much more expensive and worse performance. We’ve been struggling for over a month to get a single (1) non-beefy non-GPU VM allocated on Azure, since they’ve been having insane capacity issues, to the extent that even “provisioned” capacity cannot be fulfilled ;-(


Sure, but that’s because Azure. I’m sorry someone made the decision to go there. AWS & GCP, stock outs at least used to be nearly unheard of.


Until you hit a certain scale.

I totally agree about Azure being the worst of the three, they wanted us to commit to certain use before even buying hardware themselves. Crazy…

But I also had capacity issues with Google at large scales in many zones.


What sort of scale, if you don’t mind me asking?

Hey, sure! That’s important context.

One gameserver was 40vCPU and 256G of RAM, we had about 30-50 before we’d see some issues in some regions. (this is from memory now unfortunately).

Sao Paulo and Tokyo being the worst, but Singapore, Australia and Mumbai also had issues at various times.

The other places where we hit hard limits was Los Angeles, but we had more than a hundred instances then.

The issue with the hard limits is that it’ll be one zone thats exceeded and the API will fail, so you have to retry with another zone in the same region, but you don’t get to practice building your autoscaler before you actually need it.


Oh interesting. Yeah, and I’m guessing you don’t get much prior visibility into available stock, since that would also expose info to their competitors.

That is a lot lower than I expected, but I also imagine that’s a sizable order that they like getting.


If you're a top X customer running in a smaller region of AWS or GCP, yes, you need to do capacity planning with your TAM. You get to a point where quota increase requests are not auto-approved.


>to a cloud based shop where we used every cloud buzzword tech under the sun

this is me currently, not so much "tech" as repackaged cloud services as SAAS and middleware, department is amortizing hours and effort against projects, the eldritch horrors you're referring to are starting to manifest and I want out.


However old, .NET Framework was still using a decent JIT compiler with a statically typed language, powerful GC and a fully multi-threaded environment :)

Of course Node could not compete, and the cost had to be paid for each thinly sliced microservice carrying heavy runtime alongside it.


I don't think we need to bring languages and frameworks into this. Some of them make things worse, others -- much better.

Furthermore, the microservices craze only made things worse regardless of PL / framework.

IMO we have an entire generation (maybe two) of devs who never self-hosted. That's the main audience of the article and of many of the comments here.

The rest of us who only scoffed at the cloud and microservices were just waiting the world to start coming back to its senses again.


What is old is new again.

My employer is so conservative and slow that they are forerunning this Local Cloud Edge Our Basement thing by just not doing anything.


> What is old is new again.

Over the years I tried occasionally to look into cloud, but it never made sense. A lot of complexity and significantly higher cost, for very low performance and a promise of "scalability". You virtually never need scalability so fast that you don't have time to add another server - and at baremetal costs, you're usually about a year ahead of the curve anyways.


A nimble enough company doesn't need it, but I've had 6 months of lead time to request one extra server in an in-house data center due to sheer organizational failure. The big selling point of the cloud really was that one didn't have to deal with the division lording over the data center, or have any and all access to even log in by their priesthood who knew less unix than the programmers.

I've been in multiple cloud migrations, and it was always solving political problems that were completely self inflicted. The decision was always reasonable if you looked just at the people the org having to decide between the internal process and the cloud bill. But I have little doubt that if there was any goal alignment between the people managing the servers and those using them, most of those migrations would not have happened.


I've been in projects where they're 'on the cloud' to be 'scalable', but I had to estimate my CPU needs up front for a year to get that in the budget, and there wasn't any defined process for "hey, we're growing more than we assumed - we need a second server - or more space - or faster CPUs - etc". Everything that 'cloud' is supposed to allow for - but ... that's not budgeted for - we'll need to have days of meetings to determine where the money for this 'upgrade' is coming from. But our meetings are interrupted by notices from teams that "things are really slow/broken"...


About sums up my last job. Desperation leads to micromanaging the wrong indicators. The results are rage-inducing. I am glad I got let go by the micromanagers because if not I would have quit come New Year.

Yeah, clouds are such a huge improvement over what was basically an industry standard practice to say oh you want a server fill out this 20 page form and will get you your server in 6 to 12 months.

But we don't need one minute response times from the cloud really. So something like hetzner that may just be all right. We'll get it to you within an hour. It's still light years ahead of what we used to be.

And if it makes the entire management and cost side and performance with bare metal or closer to bare metal on the provider side, then that is all good.

And this doesn't even address the fact that yeah, AWA has a lot of hidden costs, but a lot of those managed data center outsourcing contracts where you were subjected to those lead times for new servers... really weren't much cheaper than AWS back in the day.


in my experience i can rescale Hetzner servers and they'll be ready in a minute or two


Yes, sorry, I didn't mean to impugn Hetzner by saying they were an hour delay, just that there could be providers that are cheaper that didn't need to offer AWS-level scaling.

Like a company should be able to offer 1 day service, or heck 1 week with their internal datacenters. Just have a scheduled buffer of machines to power up and adapt the next week/month supply order based on requests.


The management overhead in requesting new cloud resources is now here. Multiple rounds of discussion and TPS reports to spin up new services that could be a one click deploy.

The bureaucracy will always find a way.


Worst is when one of those dysfunctional orgs that does the IT systems administration tries to create their own internal cloud offerings instead of using a cloud provider. It's often worse than hosted clouds or bare metal.

But I definitely agree, it's usually a self-inflicted problem and the big gamble attempting to work around infrastructure teams. I've had similar issues with security teams when their out of the box testing scripts show a fail, and they just don't comprehend that their test itself is invalid for the architecture of your system.


Running away from internal IT works until they inevitably catch up to the escapees. At $dayjob the time required to spin up a single cloud VM is now measured in years. I’ve seen projects take so long that the cloud vendor started sending deprecation notices half way through for their tech stacks but they forged ahead anyway because it’s “too hard to steer that ship”.

The current “runners” are heading towards SaaS platforms like Salesforce, which is like the cloud but with ten times worse lock in.


> At $dayjob the time required to spin up a single cloud VM is now measured in years.

We have a Service Now ticket that you can complete that spins the server up at completion. Kind of an easy way to do it.


Then you end up with too-large servers all over the place with no rhyme or reason, burning through your opex budget.

Also, what network does the VM land in? With what firewall rules? What software will it be running? Exposed to the Internet? Updated regularly? Backed up? Scanned for malware or vulnerabilities? Etc…

Do you expect every Tom, Dick, and Harry to know the answers to these questions when they “just” want a server?

This is why IT teams invariably have to insert themselves into these processes, because the alternative is an expensive chaos that gets the org hacked by nation states.

The problem is that when interests aren’t forced to align — a failure of senior management — then the IT teams become an untenable overhead instead of a necessary and tolerable one.

The cloud is a technology often misapplied to solve a “people problem”, which is why it won’t ever work when misused in this way.


Not GP, but at my previous job we had something very similar. The form did offer options for a handful of variables (on-prem VMware vs EC2, vCPU, RAM, disk, OS/template, administrators, etc), but once submitted, the ticket went to the cloud/architecture team for review, who could adjust the inputted selections as well as configure things like networks, firewall rules, security groups, etc. Once approved, the automated workflow provisioned the server(s), firewall rules, security groups, etc and sent the details to the requestor.

Those are all checkboxes on the form

The first time you do it, you can do a consult with a cloud team member

And of course they get audited every quarter so usage is tracked


Complexity? I've never set up a highly available Postgres and Redis cluster on dedicated hardware, but I can not imagine it's easier than doing it in AWS which is only a few clicks and I don't have to worry about OS upgrades and patches. Or a highly available load balancer with infinite scale.


This is how the cloud companies keep you hooked on. I am not against them of course but the notion that no one can self host in production because "it is too complex" is something that we have been fed over the last 10-15 years. Deploying a production db on a dedicated server is not that hard. It is about the fact that people now think that unless they do cloud, they are amateurs. It is sad.


I agree that running servers onprem does not need to be hard in general, but I disagree when it comes to doing production databases.

I've done onprem highly available MySQL for years, and getting the whole master/slave thing go just right during server upgrades was really challenging. On AWS upgrading MySQL server ("Aurora") is really just a few clicks. It can even do blue/green deployment for you, where you temporarily get the whole setup replicated and in sync so you can verify that everything went OK before switching over. Disaster recovery (regular backups to off site & ability to restore quickly) is also hard to get right if you have to do it yourself.


If you are running k8s on prem, the "easy" way is to use a mature operator, taking care of all of that.

https://github.com/percona/percona-xtradb-cluster-operator https://github.com/mariadb-operator/mariadb-operator or CNPG for Postgres needs. They all work reasonable well, and cover all the basic (HA, replication, backups, recovery, etc).


It's really hard to do blue/green on prem with giant expensive database servers. Maybe if you're super big and you can amortize them over multiple teams, but most shops aren't and can't. The cloud is great.


Doing stuff on-prem or in a data centre _is_ hard though.

It's easy to look at a one-off deployment of a single server and remark on how much cheaper it is than RDS, and that's fine if that's all you need. But it completely skips past the reality of a real life resilient database server deployment: handling upgrades, disk failures, backups, hot standbys, encryption key management, keeping deployment scripts up to date, hardware support contracts and vendor management, the disaster recovery testing for the multi-site SAN fabric with fibre channel switches and redundant dedicated fibre, etc. Before the cloud, we actually had a staff member who was entirely dedicated to managing the database servers.

Plus as a bonus, not ever having to get up at 2AM and drive down to a data centre because there was a power failure due to a generator not kicking in, and it turns out the data centre hadn't adequately planned for the amount of remote hands techs they'd need in that scenario...

RDS is expensive on paper, but to get the same level of guarantees either yourself or through another provider always seems to end up costing about the same as RDS.


I have done all of this also, today I outsource the DB server and do everything else myself, including a local read replica and pg_dump backups as a hail mary.

Essentially all that pain of yonder years was essentially storage it was a F**ing nightmare running HA network storage before the days of SSDs. It was slower than RAID, 5X more expensive than RAID and generally involved an extreme amount of pain and/or expense (usually both). But these days you only actually need SANs or as we call it today block storage when you have data you care about, again you only have to care about backups when you have data you care about.

For absolutely all of us the side effect of moving away from monolithic 'pets' is that we have made the app layer not require any long term state itself. So today all you need is N X any random thing that might lose data or fail at any moment as your app servers and an external DB service (neon, planetscale, RDS), plus perhaps S3 for objects.


Database is one of those places where it's justified, I think. Application containers do not need the same level of care hence are easy to run yourself.


I guess that is the kicker right? "same level of guarantees".


I'd much rather deploy cassandra, admittedly a complex but failure resistant database, on internal hardware than on AWS. So much less hassle with forced restarts of retired instances, noisy nonperformant networking and disk I/O, heavy neighbors, black box throttling, etc.

But with Postgres, even with HA, you can't do geographic/multi-DC of data nearly as well as something like Cassandra.


> I've never set up a highly available Postgres and Redis cluster on dedicated hardware, but I can not imagine it's easier than doing it in AWS which is only a few clicks

It's "only a few clicks" after you have spent a signficant amount of time learning AWS.


As a self hosting fan, i cant even fathom how hard it would be to even get started running a Postgres or redis cluster on AWS.

Like, where do I go? Do i search for Postgres? If so where? Does the IP of my cluster change? If so how to make it static? Also can non-aws servers connect to it? No? Then how to open up the firewall and allow it? And what happens if it uses too much resources? Does it shutdown by itself? What if i wanna fine tune a config parameter? Do I ssh into it? Can i edit it in the UI?

Meanwhile, all that time finding out, and I could ssh into a server, code and run a simple bash script to download, compile, run. Then another script to replicate. And i can check the logs, change any config parameter, restart etc. no black box to debug if shit hits the fan


Having lived in both worlds, there are services wherein, yeah, host it yourself. But having done DB on-prem/on-metal, dedicated hosting, and cloud, databases are the one thing I'm happy to overpay for.

The things you describe involve a small learning curve, each different for each cloud environment, but then you never have to think about it again. You don't have to worry about downtime (if you set it up right), running a bash script ... literally nothing else has to be done.

Am I overpaying for Postgres compared to the alternatives? Hell yeah. Has it paid off? 100%, would never want to go back.


> Do i search for Postgres?

Yes. In your AWS console right after logging in. And pretty much all of your other setup and config questions are answered by just filling out the web form right there. No sshing to change the parameters they are all available right there.

> And what happens if it uses too much resources?

It can't. You've chosen how much resources (CPU/Memory/Disk) to give it. Run away cloud costs are bill by usage stuff like redshift, s3, lambda, etc.

I'm a strong advocate for self (for some value of self) hosting over cloud, but your making cloud out to be far more difficult than it is.


Actually... for Postgres specifically, it's less than 5 minutes to do so in AWS and you get replication, disaster recovery and basic monitoring all included.

I hated having to deal with PostgreSQL on bare metal.

To answer your questions should someone ask these as well and wish answers:

> Does the IP of my cluster change? If so how to make it static?

Use the DNS entry that AWS gives you as the "endpoint", done. I think you can pin a stable Elastic IP to RDS as well if you wish to expose your RDS DB to the Internet although I have really no idea why one would want that given potential security issues.

> Also can non-aws servers connect to it? No?

You can expose it to the Internet in the creation web UI. I think the default the assistant uses is to open it to 0.0.0.0/0 but the last time I did that is many years past so I hope that AWS asks you about what you want these days.

>Then how to open up the firewall and allow it?

If the above does not, create a Security Group, assign the RDS server to that Security Group and create an Ingress rule that either only allows specific CIDRs or a blanket 0.0.0.0/0.

> And what happens if it uses too much resources? Does it shutdown by itself?

It just gets dog slow if your I/O quota is exhausted, it goes into an error state when the disk goes full. Expand your disk quota and the RDS database becomes accessible again.

> What if i wanna fine tune a config parameter? Do I ssh into it? Can i edit it in the UI?

No SSH at all, not even for manually unfucking something, for that you need the assistance of the AWS support - but in about six years I never had a database FUBAR'ing itself.

As for config parameters, there's an UI for this called "parameter/option groups", you can set almost all config parameters there, and you can use these as templates for other servers you need as well.


This smells like “Dropbox is just rsync”. No skin in the game I think there are pros and cons to each but a Postgres cluster can be as easy as a couple clicks or an entry into a provisioning script. I don’t believe you would be able to architect the same setup with a simple single server ssh and a simple bash script. Unless you already wrote a bash script that magically provisions the cluster across various machines.


> As a self hosting fan, i cant even fathom how hard it would be to even get started running a Postgres or redis cluster on AWS. Like, where do I go? Do i search for Postgres? If so where?

Anything you don't know how to do - or haven't even searched for - either sounds incredibly complex, or incredibly simple.


It is not as simple as you describe to set up HA multi-region Postgres

If you don't care about HA, then sure everything becomes easy! Until you have a disaster to recover and realize that maybe you do care about HA. Or until you have an enterprise customer or compliance requirement that needs to understand your DR and continuity plans.

Yugabyte is the closest I’ve seen to achieving that simplicity with self host multi region and HA Postgres and it is still quite a bit more involved than the steps you describe and definitely more work than paying for their AWS service. (I just mention instead of Aurora because there’s no self host process to compare directly there as it’s proprietary.)


Did you try ChatGPT for step by step directions for an EC2 deployed database? It would be a great litmus test to see if it does proper security and lockdown in the process, and what options it suggests aside from the AWS-managed stuff.

It would be so useful to have an EC2/S3/etc compatible API that maps to a homelab. Again something that Claude should allegedly be able to vibecode give then breadth of documentation, examples, and discussions on the AWS API.


Your comment seems much more in the vain "I already learned how to do it this way, and I would have to learn something to do it the other way"

Which is of course true, but it is true for all things. Provisioning a cluster in AWS takes a bit of research and learning, but so did learning how to set it up locally. I think most people who know how to do both will agree it is simpler to learn how to use the AWS version than learning how to self host it.


A fun one in the cloud is "when I upgrade to a new version of Postgres, how long is the downtime and what happens to my indexes?"


For AWS RDS, no big deal. Bare metal or Docker? Oh now THAT is a world of pain.

Seriously I despise PostgreSQL in particular in how fucking annoying it is to upgrade.


Yep. I know folks running their own clusters on AWS EC2 instead of RDS. They're still on 3 or 4 versions back because upgrading Postgres is a PITA.


If you can self host postgres, you'll find "managing" RDS to be a walk in the park.


If you are talking about RDS and ElasticCache, it’s definitely NOT a few clicks if you want it secure and production-ready, according to AWS itself in their docs and training.

And before someone says Lightsail: is not meant for highly availability/infinite scale.


> I've never set up a highly available Postgres and Redis cluster on dedicated hardware, but I can not imagine it's easier than doing it in AWS which is only a few clicks and I don't have to worry about OS upgrades and patches

Last I checked, stack overflow and all of the stack exchange sites are hosted on a single server. The people who actually need to handle more traffic than that are in the 0.1% category, so I question your implicit assumption that you actually need a Postgres and Redis cluster, or that this represents any kind of typical need.


SO was hosted on a single rack last I checked, not a single box. At the time they had an MS SQL cluster.

Also, databases can easily see a ton of internal traffic. Think internal logistics/operations/analytics. Even a medium size company can have a huge amount of data, such as tracking every item purchased and sold for a retail chain.


They use multiple servers for redundancy, but they are using only 5-10% capacity per [1], so they say they could run on a single server given these numbers. Seems like they've since moved to the cloud though [2].

[1] https://www.datacenterdynamics.com/en/news/stack-overflow-st...

[2] https://stackoverflow.blog/2025/08/28/moving-the-public-stac...


If you don’t find AWS complicated you really haven’t used AWS.


If you were personally paying the bill, you'd probably choose the self host on cost alone. Deploying a DB with HA and offsite backups is not hard at all.


I have done many postgres deploys on bare metal. The IOPS and storage space saved (zfs compression because psql is meh) is huge. I regularly used hosted dbs but largely for toy DBs in GBs not TBs.

Anyway, it is not hard and controlling upgrades saves so much time. Having a clients db force upgraded when there is no budget for it sucks.

Anyway, I encourage you to learn/try it when you have opportunity


> I've never set up a highly available Postgres and Redis cluster on dedicated hardware, but I can not imagine it's easier than doing it in AWS which is only a few clicks

I haven ever setup a AWS postgres and redis, and know its more then a few clicks. there is simply basic information that you need to link between services, where it does not matter if its cloud or hardware, you still need to do the same steps, be it from CLI or WebInterface.

And frankly, these days with LLMs, its no excuse anymore. You can literally ask a LLM to do the steps, explain them to you, and your off to the races.

> I don't have to worry about OS upgrades and patches

Single command and reboot...

> Or a highly available load balancer with infinite scale.

Unless your google, overrated ...

You literally rent from places like Hetzner for 10 bucks a load balancer, and if your old fascion, you can even do a DNS balancing.

Or you simply rent a server 10x the performance what Amazon gives (for the same price or less), and you do not need a load balancer. I mean, for 200 bucks, you rent a 48 core 96 thread server at Hetzner... Who needs a load balancer again... You will do millions or requests on a single machine.


For anything "serious", you'll want a load balancer for high availability, even if there's no performance need. What happens when your large server needs an OS upgrade or the power supply melts down?


Well you can have managed resources on premises.

It costs people and automation.


People are usually the biggest cost in any organisation. If you can run all your systems without the sysadmins & netadmins required to keep it all upright (especially at expensive times like weekends or run up to Black Friday/Xmas), you can save yourself a lot more than the extra it'll cost to get a cloud provider to do it all for you.


Every large organization that is all in on cloud I have worked at has several teams doing cloud work exclusively (CICD, Devops, SRE, etc), but every individual team is spending significant amounts of their time doing cloud development on top of that work.


This. There's a lot of talk of 'oh you will spend so much time managing your own hardware' when I've found in practice it's much less time than wrangling the cloud infrastructure. (Especially since the alternatives are usually still a hosting provider that mean you don't have to physically touch the hardware at all, though frankly that's often also an overblown amount of time. The building/internet/cooling is what costs money but there's already a wide array of co-location companies set up to provide exactly that)


I think you are very right, and to be specific, IAM roles, connecting security groups, terraform plan/apply cycles, running Atlantis through GitHub, all that takes tremendous amounts of time and requires understanding a very large set of technologies on top of the basic networking/security/PostGRES knowledge.


The cost to run data-centers for a large company that is past the co-location phase, I am not sure where those calculations come out to. But yeah in my experience, running even a fairly large amount of bare metal nix servers in colocation facilities are really not that time consuming.


I can’t believe this cloud propaganda remains so pervasive. You’re just paying DevOps and “cloud architects” instead.


Exactly. It's sad that we have been brain washed by the cloud propaganda long enough now. Everyone and their mother thinks that to setup anything in production, you need cloud otherwise it is amaeteurish. Sad.


Exactly, for the narrowly defined condition of running k8s on digital ocean with a managed control plane compared to Hetzner bare metal:

AWS and DigitalOcean = $559.36 monthly or Hetzner = $132.96 The cost of an engineer to set up and maintain a bare metal k8s cluster is going to far exceed the roughly $400 monthly savings.

If you run things yourself and can invest sweat equity, this makes some sense. But for any company with a payroll this does not math out.


That argument is compelling only at a first glance IMO. If you take a look at it another way then:

1. The self-hosting sweat and nerves are spent only once, 80% of them anyway (you still have to maintain every now and then e.g. upgrade).

2. The cloud setup will require babysitting as well and as such the argument that you only pay someone salary when self-hosting does not hold water.

Ultimately it's a tradeoff between (a) the short- or long-term thinking of leadership, (b) in-house expertise and (c) how much money are you willing to throw at the problem for the promised shorter timelines -- and that one is assuming you'll find high-quality cloud hosting engineers which, believe me, is far from a given.


Wouldn't you want someone watching over cloud infra at those times too? So maybe slightly less, but still need some people being ready.


Yeah I always just kinda laugh at these comparisons, because it's usually coming from tech people who don't appreciate how much more valuable people's time is than raw opex. It's like saying, you know it's really dumb that we spend $4000 on Macbooks for everyone, we could just make everyone use Linux desktops and save a ton of money.


> It's like saying, you know it's really dumb that we spend $4000 on Macbooks for everyone, we could just make everyone use Linux desktops and save a ton of money.

Ohh idk if this is the best comparison, due to just how much nuance bubbles up.

If you have to manage those devices, Windows and Active Directory and especially Group Policy works well. If you just have to use the devices, then it depends on what you do - for some dev work, Linux distros are the best, hands down. Often times, Windows will have the largest ecosystem and the widest software support (while also being a bit of a mess). In all of the time I’ve had my MacBook I really haven’t found what it excels at, aside from great build quality and battery life, it feels like one of those Linux distros that do things differently just for the sake of it, even the keyboard layout, the mouse acceleration feeling the most sluggish (Linux distros feel the best, Windows is okay) even if the trackpad is fine, as well as stuff like needing DiscreteScroll and Rectangle and some other stuff to make generic hardware feel okay (or even multi display work), maybe creative software is great there.

It’s the kind of comparison that derails itself in the mind of your average nerd.

But I get the point, the correct tool for the job and all that.


Sorry for off-topic but IMO MacBooks started losing value hard since the release of macOS Tahoe.

They were super fast, now part of them are sluggish.

As much as people hate to hear it, Apple is finished. They peaked and have nowhere to go. AI bubble is not going to last more than 1-3 years still, and Apple's inability to make a stable OS upgrade that doesn't ruin people's machines performance puts them in a corner.

Combine this with the fact that MS announced end of support for Windows 10 and both these corporations ironically start to make a strong case for Linux.

Is Linux desktop quite there? Maybe not fully but it's IMO pushing beyond 80% and people who don't like Windows and macOS anymore are starting to weigh their options.


If "cloud" took zero time, then sure.

It actually takes a lot of time.


"It's actually really easy to set up Postgres with high availability and multi-region backups and pump logs to a central log source (which is also self-hosted)" is more or less equivalent to "it's actually really easy to set up Linux and use it as a desktop"

In fact I'd wager a lot more people have used Linux than set up a proper redundant SQL database


Honestly, I don't see a big difference between learning the arcane non-standard, non-portable incantations needed to configure and use various forks of standard utilities running on the $CLOUD_PROVIDER, and learning to configure and run the actual service that is portable and completely standard.

Okay, I lied. The later seems much more useful and sane.


What is this?!

You are self-managing expensive dedicated hardware in form of MacBooks, instead of renting Azure Windows VM's?!

Shame!


Don't be silly, - the MacBook Pro's are just used to RDP to the Azure Windows VMs ;)


That's how they can get away with such seemingly high prices.


What is more likely to fail? The hardware managed by Hetzner or your product?

I'm not saying that you won't experience hardware failures, I am just saying that you also need to remember that if you want your product to keep working over the weekend then you must have someone ready to fix it over the weekend.


Cloud providers and even cloudflare go down regularly. Relax.


Sure - but when AWS goes down, Amazon fixes it, even on the weekends. If you self-host, you need to pay a person to be on call to fix it.


Not only that. When your self-host goes down your customers complain that you are down. When AWS goes down your customers complain that internet is down


AWS doesn't have to pay people (LOTS OF PEOPLE) to keep things running over the weekends?

And they aren't...just passing those costs on to their customers?


They are of course, but it's amortized over many users. If you're a small company, it's hard to hire one-tenth of an SRE.


Not every business needs that kind of uptime.

How often is GitHub down? We are all just fine without it for a while.


I mean, yes, but also I get "3 nines" uptime by running a website on a box connected to my isp in my house. (it would easily be 4 or 5 nines if I also had a stable power grid...)

There's a lot, a lot of websites where downtime just... doesn't matter. Yes it adds up eventually but if you go to twitter and its down again you just come back later.


"3 nines" is around 8 hours of downtime a year. If you can get that without a UPS or generator, you already have a stable power grid.


except you now have your developers chasing their own tails figuring out how to insert the square peg in the round hole without bankrupting the company. cloud didn't save time, it just replaced the wheels for the hamsters.


Right, because cloud providers take care of it all. /s Cloud engineers are more expensive than traditional sysadmins.


I'm a designer with enough front-end knowledge to lead front-end dev when needed.

To someone like me, especially on solo projects, using infra that effectively isolates me from the concerns (and risks) of lower-level devops absolutely makes sense. But I welcome the choice because of my level of competence.

The trap is scaling an org by using that same shortcut until you're bound to it by built-up complexity or a persistent lack of skill/concern in the team. Then you're never really equipped to reevaluate the decision.


The benefit of cloud has always been that it allows the company to trade capex for opex. From an engineering perspective, it trades scalability for complexity, but this is a secondary effect compared to the former tradeoff.


"trade capex for opex"

This has nothing to do with cloud. Businesses have forever turned IT expenses from capex to opex. We called this "operating leases".


I’ve heard this a lot, but… doesn’t Hetzner do the same?


Hetzner is also a cloud. You avoid buying hardware, you rent it instead. You can rent either VMs or dedicated servers, but in both cases you own nothing.


If everything is properly done, it should be next to trivial to add a server. When I was working on that we had a written procedure, when followed strictly, it would just take less than an hour


If you’re just running some CRUD web service, then you could certainly find significantly cheaper hosting in a data center or similar, but also if that’s the case your hosting bill is probably a very small cost either way (relative to other business expenses).

> You virtually never need scalability so fast that you don't have time to add another server

What do you mean by “time to add another server?” Are you thinking about a minute or two to spin up some on-demand server using an API? Or are you talking about multiple business days to physically procure and install another server?

The former is fine, but I don’t know of any provider that gives me bare metal machines with beefy GPUs in a matter of minutes for low cost.


Weeks. I'm talking about multiple business weeks to spin up a new server. Sure, in a pinch I can do it in a weekend, but adding up all the stakeholders, talking it over and doing things right it takes weeks. It's a normal timespan for a significant chunk of extra power - a modern day server from Hetzner comes with over 1Tb of RAM and around 100 cores. This is also where all the reserve capacity comes from - you actually do have this kind of time to prepare.

Sure, there are scenarios where you need capacity faster and it's not your fault. Can't think of any offhand, but I imagine there are. It's perfectly fine for them to use cloud.


It’s kinda good if your requirements might quadruple or disappear tonight or tomorrow, but you should always have a plan to port to reserved / purchased capacity.


As an infrastructure engineer (amongst other things), hard disagree here. I realize you might be joking, but a bit of context here: a big chunk of the success of Cloud in more traditional organizations is the agility that comes with it: (almost) no need to ask permission to anyone, ownership of your resources, etc. There is no reason that baremetal shouldn't provide the same customer-oriented service, at least for the low-level IaaS, give-me-a-VM-now needs. I'd even argue this type of self-service (and accounting!) should be done by any team providing internal software services.


The permissions and ownership part has little to do with the infrastructure – in fact I've often found it more difficult to get permissions and access to resources in cloud-heavy orgs.


This could be due to the bureaucratic parts of the company being too slow initially to gain influence over cloud administration, which results in teams and projects that use the cloud being less hindered by bureaucracy. As cloud is more widely adopted, this advantage starts to disappear. However, there are still certain things like automatic scaling where it still holds the advantage (compared to requesting the deployment of additional hardware resources on premises).


I think also this was only a temporary situation caused by the IT departments in these organisations being essentially bypassed. Once it became a big important thing then they have basically started to take control of it and you get the same problems (in fact potentially more so because the expense means there's more pressure cut down resources).


"No need to ask permission" and "You get the same bill every month" kinda work against one another here.


I should have been more precise… Many sub-orgs have budget freedom to do their job, and not having to go through a central authority to get hardware is often a feature. Hence why Cloud works so well in non-regulatory heavy traditional orgs: budget owner can just accept the risks and let the people do the work. My comment was more of a warning to would-be infrastructure people: they absolutely need to be customer-focused, and build automation from the start.


I'm at a startup and I don't have access to the terraform repo :( and console is locked down ofc.


don't underestimate the ability of traditional organisations to build that process around cloud

you keep the usual BS to get hardware, plus now it's 10x more expensive and requires 5x the engineering!


This is my experience, though the lead time for 'new hardware' on cloud is only 6-12 weeks of political knife fighting instead of 6-18 months of that plus waiting.


That's a cultural issue. Initially at my workplace people needed to ask permissions to deploy their code. The team approving the deployment got sick of it and built a self-service deployment tool with security controls built in and now deployment is easy. All it matters is a culture of trusting other fellow employees, a culture of automating, and a culture of valuing internal users.


Agreed, that's exactly what I was aiming at. I'm not saying that it's the only advantage of Cloud, but that orgs with a dysfunctional resource-access culture were a fertile ground for cloud deployments.

Basically: some managers gets fed-up with weeks/months of delays for baremetal or VM access -> takes risks and gets cloud services -> successful projects in less time -> gets promoted -> more cloud in the org.


Well ye it is more like I frame it as a joke but I do mean it.

I don't argue there aren't special cases for using fancy cloud vendors, though. But classical datacentre rentals get you almost always there for less.

Personally I like being able to touch and hear the computers I use.


> no need to ask permission to anyone, ownership of your resources, etc

In a large enough org that experience doesn’t happen though - you have to go through and understand how the org’s infra-as-code repo works, where to make your change, and get approval for that.


You also need to get budget, few months earlier, sometimes even legal approval. Then you have security rules, „preferred” services and the list goes on..



> What is old is new again.

I think there is a generational part as well. The ones of us that are now deep in our 40s or 50s grew up professionally in a self-hosted world, and some of us are now in decision-making positions, so we don't necessarily have to take the cloud pill anymore :)

Half-joking, half-serious.


I'm in my 40s and run my own company. We deliver a data platform, our customers can choose between our self-hosted solution or run it on AWS/Azure for 10x higher cost.


As a career security guy, I've lost count of the battles I've lost in the race to the cloud...now it's 'we have to up the budget $250k a year to cover costs' and you just shrug.

The cost for your first on-prem datacenter server is pretty steep...the cost for the second one? Not so much.


> What is old is new again.

It's not really. It just happens that when there is a huge bullshit hype out there, people that fall for it regret and come back to normal after a while.

Better things are still better. And this one was clearly only better for a few use-cases that most people shouldn't care about since the beginning.


My employer also resisted using cloud compute and sent staff explanations why building our own data centers is a good thing.


"Do nothing, Win"


Using the S3 API is like chopping onions, the more you do it, the faster you start crying.


Less to no crying when you use a sharp knive. Japanese chefs say: no wonder you are crying, you squash them.


Haha!

My only “yes, but…” is that this:

> 50k API calls per second (on S3 that is $20-$250 _per second_ on API calls!).

kind of smells like abuse of S3. Without knowing the use case, maybe a different AWS service is a better answer?

Not advocating for AWS, just saying that maybe this is the wrong comparison.

Though I do want to learn about Hetzner.


You're (probably) not wrong about the abuse thing, but it sure is nice to just not care about that when you have fixed hardware. I find trying to guess which of the 200 aws services is the cheapest kinda stressful.


They conveniently provide no detail about the usecase, so it's hard to tell

But, yeah, there's certainly a solution to provide better performances for cheaper, using other settings/services on AWS


We're hoping to write a case study down the road that will give more detail. But the short version is that not all parts of the client's organisation have aligned skills/incentives. So sometimes code is deployed that makes, shall we say, 'atypical use' of the resources available.

In those cases, it is great to a) not get a shocking bill, and b) be able to somewhat support this atypical use until it can be remedied.


Thank you for the reply

I'm honestly quite interested to learn more about the usecase that required those 50k API calls!

I've seen a few cases of using S3 for things it was never intended for, but nothing close to this scale


Why would it be abuse? Serving e.g. map tiles on a busy site can get up to tens of thousands of qps, I'd have thought serving that from S3 would have made sense if it weren't so expensive.


I don’t know much about map tiles… but could that be done more effectively through a CDN or cache, and then have S3 behind it?

Then the CDN takes the beating. So this still sounds like S3 abuse to me.

But I leave room for being wrong here.

Edit: presumably if your site is big enough to serve 50k RPS it’s big enough for a cache?


I've left a job because it was impossible to explain this to an ex-Googler on the board who just couldn't stop himself from trying to be a CTO and clownmaker at the company.

The rough part was that we had made hardware investments and spent almost a year setting up the system for HA and immediate (i.e. 'low-hanging fruit') performance tuning and should have turned to architectural and more subtle improvements. This was a huge achievement for a very small team that had neither the use nor the wish to go full clown.


Ya but then you need to pay for a team to maintain network and continually secure and monitor the server and update/patch. The salaries of those professionals , really only make sense for a certain sized organization.

I still think small-midsized orgs may be better off in cloud for security / operations cost optimization.


You still need those same people even if you're running on a bunch of EC2 and RDS instances, they aren't magically 'safer'.


I mean, by definition yes they are. RDS is locked down by default. Also if you're using ECS/Fargate (so not EC2) as the person writing the article does, it's also pretty much locked down outside of your app manifest definitions. Also your infra management/cost is minimal compared to running k8s and bare metal.


This implies cloud infrastructure experts are cheaper than bare metal Linux/networking/etc experts. Probably in most smaller organizations, you have the people writing the code manage the infra, so it's an "invisible cost", but ime, it's easy to outgrow this and need someone to keep cloud costs in check within a couple of years, assuming you are growing as fast as an average start-up.


I think it's completely different ballparks to compare the skill sets...

It is cheaper/easier for me to hire cloud infrastructure _capable_ people easier and cheaper than a server _expert_. And a capable serverless cloud person is MUCH cheaper and easier to find.

You don't need to have 15 years of a Linux experience to read a JSON/YAML blob about setting up a secure static website.. of you need to figure out how to set up an S3 bucket and upload files... And another bucket for logging... And you have to go out of your way now to not be multi-az and to expose it to public read... I find most people can do this with minimal supervision and experience as long as they understand the syntax and can read the docs.

The equivalent to set up a safe and secure server is a MUCH higher bar. What operating system will they pick? Will it be sized correctly? How are application logs offloaded? What are the firewall rules? What is the authentication / ssh setup? Why did we not do LDAP integration? What malware defense was installed? In the event of compromise, do we have backups? Did you setup an instance to gather offloaded system logs? What is the company policy going to be if this machine goes down at 3am? Do we have a backup? Did we configure fail over?

I'm not trying to bash bare metal. I came from that space. I lead a team in the middle of nowhere (by comparison to most folks here) that doesn't have a huge pool of people with the skills for bare metal.. but LOTS of people that can do competent severless with just one highly technical supervisor.

This lets us higher competent coders which are easier to find, and they can be reasonably expected to have or learn secure coding practices... When they need to interact with new serverless stuff, our technical person gets involved to do the templating necessary, and most minor changes are easy for coders to do (e.g. a line of JSON/YAML to toggle a feature)


This comment pretty much sums up this argument. Well said.

As with everything, choose the right tool for the job.

If it feels expensive or risky, make a u-turn, you probably went off the rails somewhere unless you’re working on bleeding edge stuff, and lbh most of us are not.


I very much understand this, and that is why we do what we do. Lots of companies feel exactly as you say. I.e. Sure it is cheaper and 'better', but we'll pay for it in salaries and additional incurred risk (what happens if we invest all this time and fail to successfully migrate?)

This is why we decided to bundle engineering time with the infrastructure. We'll maintain the cluster as you say, and with the time left over (the majority) we'll help you with all your other DevOps needs too (CI/CD pipelines, containerising software, deploying HA Valkey, etc). And even after all that, it still costs less than AWS.

Edit: We also take on risk with the migration – our billing cycle doesn't start until we complete the migration. This keeps our incentives aligned.


That used to be the case until recently. As much as neither I nor you want to admit it -- the truth is ChatGPT can handle 99% of what you would pay for "a team to maintain network and continually secure and monitor the server and update/patch." Infact, ChatGPT surpasses them as it is all encompassing. Any company now can simply pay for OpenAI's services and save the majority of the money they would have spent on the, "salaries of those professionals." BTW, ChatGPT Pro is only $200 a month ... who do you think they would rather pay?


You have a link to some proof that chat gpt is patching servers running databases with no down time or data loss?


I think the argument is that dev with some vibe coding can successfully setup servers that are good enough already for 10x less cost and 95% reliability


This is an extremely bold statement to make. Vibe coding by a non-expert is the best way to introduce hard to find security issues.


Plus that 5% left out is a one in twenty chance that some business critical service may fail when least convenient.

And when it does, the person that vibed it into existence will only have ChatGPT to fall back to, having no personal or organizational experience to rely on.

But they have a 95% chance of getting it right, if they don't panic too much.


I would pay you 100x that amount monthly to perform those services, as long as you assume the risk. If you're convinced this is viable, you should start a business :)


Then you have to replace those professionals with even more specialized and expensive professionals in order be able to deploy anything.


If you haven't had to fight network configuration, monitoring, and security in a cloud provider you must have a very simple product. We deploy our product both in colos and on a cloud provider, and in our experience, bare-metal network maintenance and network maintenance in a PaaS consumes about the same number of hours.


Isn't most vulnerabilities in your own server software or configs anyways?


There is a graph database which does disk IO for database startup, backup and restore as single threaded, sequential, 8kb operations.

On EBS it does at most 200MB/s disk IO just because the EBS operation latency even on io2 is about 0.5 ms. Even though the disk can go much faster, disk benchmarks can easily do multi-GB/s on nodes that have enough EBS throughput.

On instance local SSD on the same EC2 instance it will happily saturate the whatever instance can do (~2GB/s in my case).


What graph db is that?


neo4j


What is the cost of running Neo4j on aws vs using aws Neptune? Related to disk I/o?


It is not easily possible to directly compare neo4j and AWS Neptune as the former does not exist as a fully managed service in AWS. neo4j is available through the AWS marketplace, though, but it most assuredly runs on an EC2 instance by neo4j (the company).

We run a modest graph workload (relatively small dataset wise but an intense on graph edge wise) on Neptune that costs us slightly under USD 600 per month – that is before the enterprise discount, so in reality we pay USD 450-500 a month. But we use Neptune Serverless that bursts out from time to time, which means that monthly charges are averaged out across the spikes/bursts. The monthly charges are for the serverless configuration of 3-16 NPU's.

Disk I/O stats are not available for Neptune, moreso for serverless clusters, and they would not be insightful anyway. The transactions per second rate is what I look at.


Tbh, I don't know. For us the switching cost alone would be pretty high. That said ongoing maintenance is pretty high as well.


Just want to chime in. Zhenni, cofounder of PuppyGraph. We created the first graph query engine that can sit on top of your relational databases (think Postgres, Iceberg, Delta lake, etc.), and query your relational data as a graph using Cypher and Gremlin, without any ETL or a separate graphdb needed. It's much more lightweight and easy to spin up. Because we sit on top of column based storage and our compute engine is distributed, we can achieve subsecond query speed across 1 billion nodes. Please check it out!


I do not disagree, but just for the record, that's not what the article is about. They migrated to Hetzner cloud offering.

If they had migrated to a bare metal solution they would certainly have enjoyed an even larger increase in perf and decrease in costs, but it makes sense that they opted for the cloud offering instead given where they started from.


> We typically see a doubling of performance

The AWS documents clarify this. When you get 1 vCPU in a Lambda you're only going to get up to 50% of the cycles. It improves as you move up the RAM:CPU tree but it's never the case that you get 100% of the vCPU cycles.


Sort of. 1 vcpu on x86 = 1 hyperthead not 1 core so yes you can't do uninterrupted work without some CPU "stolen".

However that's not due to aws overhead or oversubscription but x86 architecture. For production workloads 2 vcpu should be minimum recommendation.

On ARM where 1 vcpu = 1 core it is more straightforward.


> on S3 that is $20-$250 _per second_ on API calls!

It is worth pointing out that if you look beyond the nickle & diming US-cloud providers, you will very quickly find many S3 providers who don't charge you for API calls and just the actual data-shifting.

Ironically, I think one of them is Hetzner's very own S3 service. :)

Other names IIRC include Upcloud and Exoscale ... but its not hard to find with the help of Mr Google, most results for "EU S3 provider" will likely be similar pricing model.

P.S. Please play nicely and remove the spam from the end of your post.


How do you deprogram your devs and ops people from the learned helplessness of cloud native ideology?

I've found that it's almost impossible to even hire people who aren't terrified of the idea of self-hosting. This is deeply bizarre for someone who installed Linux from floppy disks in 1994, but most modern devs have fully swallowed the idea that cloud handles things for them that mere mortals cannot handle.

This, in turn, is a big reason why companies use cloud in spite of the insane markup: it's hard to staff for anything else. Cloud has utterly dominated the developer and IT mindset.


>I've found that it's almost impossible to even hire people who aren't terrified of the idea of self-hosting

Are y'all hiring? [1]

I did 15 months at AWS and consider it the worst career move of my life. I much prefer working with self-hosting where I can actually optimize the entire hardware stack I'm working with. Infrastructure is fun to tinker with. Cloud hosting feels like a miserable black box that you dump your software into and "hope"

[1] https://cursedsilicon.net/resume.pdf


>I've found that it's almost impossible to even hire people who aren't terrified of the idea of self-hosting

Funny, I couldn't find a new job for a while because I had no cloud experience, finally and ironically I got hired at AWS. Every now and then these days I get headhunters unsure about my actual AWS experience because of my lack of certifications.


So you'd rather self host a database as well? How do you prevent data loss? Do you run a whole database cluster in multiple physical locations with automatic failover? Who will spend time monitoring replication lag? Where do you store backups? Who is responsible for tuning performance settings?


Hosting a database is no different than self-hosting any other service. This viewpoint hath what cloud wrought, this atrophying of the most basic operational skills, as if running these magic services are only achievable by the hyperscalers who said they are the only ones capable.

The answers to all of your questions are a hard: it depends. What are your engineering objectives? What are your business requirements? Uptime? Performance? Cost constraints and considerations? The cloud doesn't take away the need to answer these questions, it's just that self-hosting actually requires you to know what you are doing versus clicking a button and just hoping for the best.


I would argue that correctly tuning a database is significantly more difficult than most services one would self host.

But that said, you can afford a lot more hardware if you’re not using RDS, so the tuning doesn’t need to be perfect.


Not... really? It's no more difficult than finding the correct buffer sizes for nginx, or finding the correct sizes for the ebpf connection table tracking map if you're using cilium on k8s, or kernel tcp buffers or any other other myriad services one could run.

Being a bit obtuse to tune doesn't really justify going all-in on cloud. It's all there in the documentation.


I really don't understand this comment. The cloud doesn't protect you from data loss or provide any of the things you named.


Yes it does? For a fraction of a dollar per hour, AWS will give me a URI that I can connect to. On the other end is a postgres instance that already has authentication, backups handled for me. It's also backed by a storage layer that is far more robust than anything I can get together in my rented cage with my corporate budget.


The cloud is also not fulfilling its end of the promise anymore - capacity on-demand. We’ve been struggling for over a month to get a single (1) non-beefy non-GPU instance on Azure, since they’ve been having just insane capacity issues, where even paying for “provisioned” capacity doesn’t make it available.


We will soon have 256 Zen 6c per socket, so at least 512 Core per server. Multiple PCIe 5.0 SSD at 14GB/s at up to half a Petabytes storage, and TBs of Memory.

And now Nvidia is in the game for Sever CPU, much faster time to market for PCIe in the future, and better x86 CPU implementation as well as ARM variants.


I love that you're not just preaching - you're offering the service at a lower cost. (I'm not affiliated and don't claim anything about their ability/reliability).


We moved DemandSphere from AWS to Hetzner for many of the same reasons back in 2011 and never looked back. We can do things that competitors can’t because of it.


Can you please explain what are some of those things? Curious to know and learn.


> If you don't want to do this yourself, then we'll do it for you for half the price of AWS (and we'll be your DevOps team too

You might not realize but you are actually increasing the business case for AWS :-) Also those hardware savings will be eaten away by two days of your hourly bill. I like to look at my project costs across all verticals...


> Also those hardware savings will be eaten away by two days of your hourly bill

Doubt it. I've personally seen AWS bills in the tens of thousands, he's probably not that costly for a day.


I don't think I have joined a startup that pays less than 20k/month to AWS or any cloud in almost a decade.

Biggest recent ones were ~200k and ~100k that we managed to lower to ~80k with a couple months of work (but it went back up again after I left).

I fondly remember lowering our Heroku bill from 5k to 2k back in 2016 after a day of work. Management was ecstatic.


The only real startup I’ve had the privy of looking into financials had 5 figure AWS bills and literally 5 customers. Single digit customers.

And the product was sending SMS’s.

Startups are clown shows for burning OPM.


Same, but in the hundreds of thousands monthly and growing at steady clip, and AWS extending credits worth -millions-, just to keep them there because their margins are so fat and juicy they can afford that insane markup.

That's where the real value lies. Not paying these usurious amounts.


I understand the concern for sure. But we don't bill hourly in that way, as one thing our clients really appreciate is predictable costs. The fixed monthly price already includes engineering time to support your team.


Does anyone have experience with say Linode and Digital Ocean performance versus AWS and GCE?

They still use VMs, but as far as I know they have simple reserved instances, not “cloud”-like weather?

Is the performance better and more predictable on large VPSes?

(edit: I guess a big difference is that VPS can have local NVMe that is persistent, whrereas EC2 local disk is ephemeral? )


I can't speak to Linode but in my experience the Digital Ocean VM performance is quite bad compared to bare metal offerings like Hetzner, OVH, etc. It's basically comparable to AWS, only a bit cheaper.


It's essentially the same product, but you do get lower disk latency. Best performance is always going to be a dedicated server which in the US seem to start around $80-100/month (just checking on serversearcher.com), DO and so on do provide a "dedicated cpu" product if that's too much.


No. DO can be equally noisy but I've always tried their regular instances and not their premium AMD/Intel ones.


> In reality, there is significant latency between Hetzner locations that make running multi-location workloads challenging, and potentially harmful to performance as we discovered through our post-deployment monitoring.

the devil is in the details, as they say.


haha this reminds me of when I used to manage Solaris system consisting of 2 servers. Sparc T7, 1 box in one state and 1 box in another. No load balancer.

Thousands and thousands of users depending on that hardware.

Extremely robust hardware.


Half year later all the data gets wiped out and what your customer can do?

And you are still charging half of AWS, which is that case I am just doing these work myself if I really think AWS is too expensive.


What do you recommend for configuration management? I've had a fairly good experience with Ansible, but that was a long time ago... anything new in that pace?


"new", I'm not sure, but I deployed 2,500 physical Windows machines with SaltStack and it worked pretty good.

it also handled some databases and webservers on FreeBSD and Windows, I considered it better than Ansible.


If you’re big, invest in this. If you’re small, slap Dokploy/Coolify on it.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: