For all the talk of needing all the cloud infra to run even a simple website, Ma...

marginalia_nu · on April 18, 2023

https://www.marginalia.nu/junk/just-a-fleshwound.webp

If anything it's running faster now. All you've done is warm up the caches and given the JVM a chance to optimize the hottest code.

(real talk the SSDs are running pretty near 100% utilization though)

adrian_mrd · on April 18, 2023

Kudos. [0] is the best URL I have come across in the past few years

[0] https://www.marginalia.nu/junk/just-a-fleshwound.webp

phendrenad2 · on April 18, 2023

"/junk/just-a-fleshwound.webp" for those on mobile

dizhn · on April 18, 2023

I love this!! :) How much does 24 real cores + 126GB ram cost on the cloud? A million dollars?

marginalia_nu · on April 18, 2023

Yeah I wouldn't try to run this in the cloud. Would be broke as a joke in a week.

This is $5000 worth of consumer hardware, give or take.

TremendousJudge · on April 18, 2023

alex_sf · on April 18, 2023

More than the cores and RAM, you have bigger issues with I/O (both throughput and latency) to disk and the network from cloud providers. Physical hardware, even when comparing cores/RAM 1:1, is outrageously faster than cloud services.

nine_k · on April 18, 2023

Don't use EBS, use the local SSD which are an option for most cloud VMs.

Nextgrid · on April 19, 2023

Problem is that those volumes are ephemeral and may not provide the reliability guarantees that the EBS volumes do, so they're only really good as cache and not for any persistent data you care about.

amluto · on April 19, 2023

I would argue that a rack-mounted chassis with lots of disks is also ephemeral, just less so: most failures in a server can be fixed by swapping some parts, and your data is still there.

But AWS and its competitors don’t have an offering even close to comparable to what you can get in a commodity server. A 1U server with one or two CPU sockets and 8-12 hot-swap NVMe bays is easy to buy and not terribly expensive, and you can easily fill it with 100+ TiB of storage with several hundred Gbps of storage bandwidth and more IOPS than you are likely able to use. EC2 has no comparable offering at any price.

(A Micron 9400 drive supposedly has 7GBps / 56Gbps of usable bandwidth. 10 of them gives 560Gbps, and a modern machine with lots of PCIe 5.0 lanes may actually be able to use a lot of this. As far as I can tell, you literally cannot pay AWS for anywhere near this much bandwidth, but you can buy it for $20k or so.)

Nextgrid · on April 19, 2023

> I would argue that a rack-mounted chassis with lots of disks is also ephemeral, just less so: most failures in a server can be fixed by swapping some parts, and your data is still there.

True but this also depends on design decisions AWS made with regards to those volumes.

Indeed it could be that the volume is internally (at the hypervisor level) redundant (maybe with something like ZFS or other proprietary RAID), but there's no way to know.

Furthermore, AWS doesn't allow you to really keep a tab or reservation on the physical machine your VM is on - every time a VM is powered up, it gets assigned a random host machine. If there is a hardware failure they advise you to reboot the instance so it gets rescheduled on another machine, so even though technically your data may still be on that physical host machine, you have no way to get it back.

AWS' intent with these seems to be to act as transient cache/scratchpad so they don't seem to offer much durability or recovery strategies for those volumes. Their hypervisor seems to treat them as disposable which is a fair design decision considering the planned use-case, but it means you can't/shouldn't use it for any persistent data.

Being in control of your own hardware (or at the very least, renting physical hardware from a provider as opposed to a VM like in AWS) will indeed allow you to get reliable direct-attach storage.

amluto · on April 19, 2023

Like this?

https://instances.vantage.sh/aws/ec2/im4gn.8xlarge?region=us...

I can buy a rather nicer 1U machine with substantially better local storage for something like half the 1-year reserved annual cost of this thing.

If you buy your own servers, you can mix and match CPUs and storage, and you can get a lot of NVMe storage capacity and bandwidth, and cloud providers don’t seem to have comparable products.

ftth_finland · on April 18, 2023

About 150€ per month as a dedicated server from Hetzner.

marginalia_nu · on April 18, 2023

Something like €200/mo if you factor in the need for disk space as well. This is also Hetzner we're talking about. They're sort of infamous for horror stories of arbitrarily removed servers and having shitty support. They're the cheapest for a reason.

But with dedicated servers, are we really talking cloud?

ftth_finland · on April 18, 2023

A dedicated server is obviously not “cloud”, but that wasn’t really the point I was making.

My point was rather underlining the absurdity of using cloud for everything.

Herzner is just an example of a dedicated server provider. There are others, some in the same price range, others a bit more.

As an aside to my point, it is often cheaper and more flexible to use dedicated servers than you buy and collocate your own hardware.

bla3 · on April 18, 2023

I only rent a small server from them, but I've been happy with their support. Talked to a real human who could help me with tech questions, even though I pay them next to nothing.

(No affiliation.)

marginalia_nu · on April 19, 2023

Good to hear. You only really hear the people complaining. With as many customers they have, it's hard to know how representative it is.

dizhn · on April 21, 2023

Most of the problems I read about are during the initial signup stage. They ask for a copy of your passport etc and even then some people can't signup because presumably their info is triggering something in Hetzner's anti fraud checks. This sucks for those people of course.

The other common cause of issues is things like crypto which they don't want in their network at all.

This will sound like I am downplaying what people have exprerienced and/or being apologetic on their behalf but that is not my intention. I am just a small time customer of theirs. I've had 1 or 2 dedicated servers with them for many many years now upgrading and migrating as necessary. (It used to be that if you waited for a year or two and upgraded you'd get a better server for cheaper. Those days are gone.)

I've only dealt with support over email where they have been both capable and helpful, but what I needed was just plugging in a hardware kvm switch (free for a few hours - i never had to pay) or replacing a failing hard drive (they do this with zero friction). Perhaps I am lenient on the tech support staff. After all they are my people. I've been to a few datacenters and have huge respect for what they do.

On the presales side they seem to reply with a matter of fact tone with no flexibility. They are a German company after all.

marginalia_nu · on April 21, 2023

Yeah.

I'm a bit wary I'd get lumped in with the crypto gang. A lot of what I'm doing with the search engine is fairly out there in terms of pushing the hardware in unusual ways.

It would also suck if there ever was a problem. The full state of the search engine is about 1 Tb of data. It's not easy to just start up somewhere else if it vanished.

belter · on April 18, 2023

So the Cloud is the cheaper option, if you factor in energy cost and hardware depreciation.

ftth_finland · on April 18, 2023

The monthly cost of a dedicated server includes everything: bandwidth, power and hardware.

How do you figure a $5k annual cloud spend is cheaper than ~150€ per month?

hiddencost · on April 18, 2023

I think you're confused.

~150€ in cloud costs is cheaper than $5k cost to buy the hardware the guy has in his living room.

Nextgrid · on April 19, 2023

Most people when talking about "cloud" refer to the likes of AWS/GCP/Azure rather than old-school bare-metal server hosts.

dizhn · on April 18, 2023

I think parent is talking about colo.

belter · on April 18, 2023

I do not understand your comment.

DHolzer · on April 18, 2023

in Azure that's roughly 5k per year if you pay for the whole year upfront. I have the pleasure of playing with 64cores, 256gb RAM and 2xV100 for data science projects every now and then. That turns out to be roughly 32k per year.

Culonavirus · on April 19, 2023

I don't know, 32k per year is pretty steep... I mean ~35k, that's the ballpark number for the entire system (if bought new).

DHolzer · on April 19, 2023

I share your perspective on pricing. I had a discussion with my team lead about why we haven't taken on the task of running our own machines. The rationale behind it is that while server management may seem easy, ensuring its security can be complex. Even in a sizable company, it's worth considering whether you want to shoulder the responsibility or simply pay a higher cost to have Microsoft handle it. Personally, I love hosting my own infrastructure. It's fun, potentially saves me some cash, allows me to learn a lot, and gives me full control. However, I understand now that on a company scale, others may see it differently.

--edit--

I forgot to add the following: that's 32k if you run the system 24/7. Usually it's up for a few hours per month, so you end up paying maybe 2k for the whole year.

gary_0 · on April 18, 2023

I'm curious about your network bandwidth/load. You only serve text, right? [Edit: No, I see thumbnail images too!] Is the box in a datacenter? If not, what kind of Internet connection does it have?

marginalia_nu · on April 18, 2023

Average load today has at worst been about 300 Kb/s TX, 200 Kb/s RX. I've got a 1000/100 mbit/s down/up connection. Seems to be holding without much trouble.

Most pages with images do lazy loading so I'm not hit with 30 images all at once. They're also webp and cached via cloudflare, softens the blow quite a lot.

rglover · on April 18, 2023

Semi-related: do you just run a static IP out of your house?

marginalia_nu · on April 18, 2023

mvdwoord · on April 18, 2023

Very well done! On mobile now but will check out the site once home.

Do you happen to have a writeup somewhere of your tech stack?

btown · on April 18, 2023

Not OP but https://github.com/MarginaliaSearch/MarginaliaSearch/blob/ma... has a diagram and component overview!

marginalia_nu · on April 18, 2023

I'm working on the documentation. It's getting there, but it's still kinda thin in many places.

btown · on April 18, 2023

IMO it's actually incredibly well-documented and thoughtfully organized for a one-person project! You should be proud of what you've put together here!

marginalia_nu · on April 18, 2023

Yeah I did a huge refactoring effort very recently. I put a lot of effort in making the code easy to poke around in and I feel that works very well.

But besides that, there's still a lot left to be desired when it comes to how it actually works. Not everything is easy to glean from the code alond.

marginalia_nu · on April 18, 2023

It's a Debian server running nginx into a bunch of custom java services that use the spark microframework[1]. I use a MariaDB server for link data, and I've built a bespoke index in Java.

[1] https://sparkjava.com/ I don't use springboot or anything like that, besides Spark I'm not using frameworks.

winrid · on April 18, 2023

If the SSDs were really maxed out you'd see high CPU usage/load as the CPU as blocked by IOWAIT.

marginalia_nu · on April 18, 2023

Not in this case, it's all mmap.

winrid · on April 18, 2023

Really? Even with mmapp'ed memory won't the CPU still register user code waiting on reading pages from disk as iowait? I'm so surprised by that that if it doesn't it sounds like a bug.

marginalia_nu · on April 18, 2023

Yeah it's at least what I've been seeing. Although it could alternatively be that a lot of the I/O activity is predictive reads, and the threads don't actually stall on page faults all I/O that often.

yonrg · on April 18, 2023

239 days up. That's brave too ;)

marginalia_nu · on April 18, 2023

Rebooting is like a hour of downtime :-/

FWIW I'm going commando with no ECC ram too.

rks404 · on April 19, 2023

Love this new definition of ‘going commando’ lol

nine_k · on April 18, 2023

I remember there is ksplice or something like that to upgrade even the kernel without a complete downtime. Everything else can be upgraded piecemeal, provided that worker processes can be restarted without downtime.

Nextgrid · on April 19, 2023

If the hardware itself is the reason for the long startup time, kexec allows you to boot a new kernel from within an existing one and avoids the firmware/HW init.

alex_sf · on April 18, 2023

StackOverflow still just runs on a pair of (beefy) SQL servers. Modern web engineering is a joke.

nemo44x · on April 18, 2023

A lot of modern web engineering is built for problems most people won't have.

nine_k · on April 18, 2023

The people strive to have these problems! Hockey stick growth, servers melting under signup requests, payment systems struggling under the stream of subscription payments! Scale up, up, up! And for that you might want to run your setup under k8s since day one, just in case, even though a single inexpensive server would run the whole thing with a 5x capacity reserve. But that would feel like a side project, not a startup!

Nextgrid · on April 19, 2023

I'd argue that a lot of modern web engineering pretends to be built for problems must people won't have. So much resume-driven-development is being done on making select, easy parts super scalable while ignoring some elephants in the room such as the datastore.

A good example is the obsession with "fast" web frameworks on your application servers, completely ignoring the fact that your database will be the first to give up even in most "heavy" web frameworks' default configurations without any optimization efforts.

dizhn · on April 18, 2023

I belive hackernews itself is a couple of servers too.

maxbond · on April 18, 2023

I think HN's stack is the right choice for them and that it fulfills its purpose excellently, but I do seem to recall both of their hard drives failing more or less simultaneously & HN going down for about 8 hours not that long ago.

If that happened at the SaaS company I worked at previously, it would be a bloodbath. The churn would be huge. And our customer's customers would be churning from them. If that happened at a particularly inopportune time, like while we'd been raising money or something, it could potentially endanger the company.

(I'd like to stress again this is not a criticism of HN/dang, but just to illustrate a set of requirements where huge AWS spends do make sense.)

winrid · on April 18, 2023

One big server with a flat text file DB on NVME drives AFAIK.

amiga-workbench · on April 18, 2023

I believe the application code is single threaded due to its interpreter too.

TremendousJudge · on April 18, 2023

it does run pretty slow under load though, and they have acknowledged it's due to this

marginalia_nu · on April 18, 2023

HN has mutable data though. That's a much harder problem than indexing a large amount of static data like a search engine.

winrid · on April 18, 2023

Not only that, but I don't think the custom datastore handles concurrent writes.

One thread FTW. :)

wyre · on April 18, 2023

I read once Reddit is just one big sql database

zerkten · on April 19, 2023

Reddit was using Cassandra in 2010 (https://www.reddit.com/r/programming/comments/bcqhi/reddits_...). The experience was mentioned on HN at https://news.ycombinator.com/item?id=21694461. I expect that they started with an RDBMS and made various moves over the years.

hedora · on April 18, 2023

Wikipedia too.

mrweasel · on April 18, 2023

That doesn't sound right: https://wikitech.wikimedia.org/wiki/MediaWiki_at_WMF

bookofjoe · on April 18, 2023

dang? Bueller? Anyone?

nine_k · on April 18, 2023

24 cores, 128 GB RAM. One could run 10-20 EC2 instances to utilize a box like this, and produce an impression of sprawling backend infrastructure.

bayindirh · on April 19, 2023

From my experience simple systems perform better on average due to less number of interconnected gears.

Much more complex systems do not perform as consistently as simple ones, and they are exponentially harder to debug, introspect and optimize at the end of the day.

MichaelZuo · on April 18, 2023

I doubt anyone would be foolish enough to claim that the site NEEDS 'cloud infra' to run.

marginalia_nu · on April 18, 2023

I think it sort of depends on what you want.

Every time I deploy a service it goes down for anything between 30 seconds and 5 minutes. When I switch indices, the entire search engine is down for a day or more. Since the entire project is essentially non-commercial, I think this is fine. I don't need five nines.

If reliability was extremely important, scales would tilt differently, maybe cloud would be a good option. A lot of it is for CYA's sake as well. If I mess up with my server, that's both my problem and my responsibility. If a cloud provider messes up, then that's a SLA violation and maybe damages are due.

nine_k · on April 18, 2023

It depends. Both https://google.com and, say, https://www.medusa.priv.at/ are technically web sites, but the complexity of the tech that makes them work is pretty different.

HarHarVeryFunny · on April 18, 2023

That is pretty impressive - not only not on its knees, but very responsive atm.