Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Taming Servers for Fun and Profit (railway.com)
32 points by dban 4 months ago | hide | past | favorite | 20 comments


> There are probably more effective methods of achieving the same, but it costs us less than a dollar to provision 50 servers using Claude to screen-scrape every minute during the install.

I think this is an important thing to remember/consider. I can't tell you how many personal projects I've stalled on worrying about costs "XYZ service/platform/API is expensive" without considering what "expensive" actually means.

Yes, they could have used OCR/image recognition-type software but what's easier than piping an image to an API and asking it?

LLMs frustrate me with their inconsistency/"fuzziness" (repeating instructions, putting them in all caps, saying "please" just rubs me the wrong way) but I know personally I have a bad habit of "That would be too expensive" or "How does it scale to X" when neither the cost nor the scale would ever be a real issue in the thing I'm writing.


Using Claude to parse screen-scrapes of a server's boot status is certainly novel. I did not expect a mention of AI usage in an article like this.

With that said, I wonder why they used AI at all here. Could they not have keyed off certain keywords or other information present in a screen scrape, rather than rely on Claude to parse it?


The article says "...we can obtain a near real-time image of the server screen". Implying that what they have is an image file, not text. Getting keywords out of an image file would normally take an OCR step. A single API call to Claude does the trick without the extra tooling.


I asked ChatGPT to make text from a photo of a whiteboard and it just executed some Pytesseract code…

I’m reminded of the guy who setup an iPhone farm to use the iOS on-device OCR because he couldn’t find anything better.


I wonder how much of this is just to get a talking point about AI in the article (and I guess it works - we're talking about it).

If you literally need to just detect whether it's at the firmware splash screen or not, simply checking if enough pixels on the image are white would detect that splash screen just fine.


author here; Claude use here was pure laziness - and personally, I found it quite funny that it worked. We could sample pixels and try and build that detection, but $<1c per run to write a prompt and get some json was too hilarious not to ship to prod.

Maybe it needs to have more complex logic/detection and we need something more complex down the road. But it's like easy and cheap OCR for now.

what was kinda funnier was that I tried to get Claude to generate its own Go client code to upload the image and run the prompt; it totally totally hallucinated on that part :).


Great writeup even though the dots aren’t connected between a lot of the aspects presented. Didn’t know about udev exposing consistent device names for network interfaces for example, or the efforts to open source switch software, or how easy it’s become to run intra-datacenter BGP. Thanks for all the links!

And of course the brilliant use of AI and discussion of how cost-effective it is. “Hook it up to an AI to save money” is the world we can look forward to. In this case the problem is recognizing which state a thing is in from a list of known states. Once the LLM gives the state in text form, all kinds of automation are unlocked. I think that class of problem - converting a state based on an image into text form - is wildly common, and will be on the lookout for it in my own automation work!


> the dots aren’t connected between a lot of the aspects presented

that's on me (author); I tried to cut the content down to a manageable post size that covered some interesting stuff - but probably dropped the connective tissue in the process. We'll keep this in mind for next time.


Out of interest, why not just PXE boot prebuilt images (buildroot/etc) that run from memory as your OS? That would save you the hassle of maintaining a stateful server, installing an OS, ensuring configuration is up to date, etc.


As a general rule, a statement that starts with “why don’t you just…” typically leans far too heavily on the “just” to handwave away the reasons why the next part of the statement isn’t going to be helpful.

In this case, you’re assuming a huge number of things like infrastructure and other requirements are in place, and all of those things take a lot of time and work, if they’re even appropriate at all.


Probably because it isn't memory efficient ? I am not sure , please take what I am saying is with a grain of salt. I may be wrong , I usually am.

But even if that's the case , couldn't they use something like vram if they are running out of memory of are we back to square one?


I mean yes, you are sacrificing some RAM. But a typical Linux OS to run VMs/containers could fit in a couple gigs. For the amount of RAM these servers have, this is peanuts and would save a lot of headaches dealing with statefulness.


Author here; RAM is so much more expensive than disk though. Two 500G M.2 NVMes for a RootFS in RAID1 are basically max ~$150 which is I think much cheaper than a single 64G DDR5 ECC RAM module [I don't have exact numbers on me, but ECC RAM is pricey]. It's also a lot harder to debug when things go wrong because you lose the machines state if everything was ephemeral.

We run a thin base OS on the boxes and then VMs on top which we consider more ephemeral. The frequency of needing to update that base OS is v. low.

I think there's a case for building a custom PXE booted RAMdisk image to replace the install though; something like what Equinix Metal (formerly Packet) do with Tinkerbell (https://github.com/tinkerbell); they call it an OS Install Environment, but the idea is a small lightweight linux install agent that can DD a golden image onto the disk (vs. an install each time).


Are you guys still hiring? Would be keen to have a chat about this and see if anything comes out. Sent one of you guys a LinkedIn invite but no response yet.


Yep! We have two roles open https://railway.com/careers#open-positions - Datacenters and Infra that might be of interest to you. Idk if many of us use LinkedIn so not the best way to get in touch. Can I share the email on your HN profile link with our in-house recruiter?


Yes please. The reason I wanted to informally reach out by LinkedIn is that I'm more interested in a chat to see if my skills align and if there's potential, even for short-term work - what you guys are doing sounds cool and I've played around with some of it in the past (thus the Buildroot suggestion - that was my approach to a similar requirement).


Hmm seems really interesting and even simple! How does this really work , can you please tell me the workflow , seems really simple and effective and i am just interested


Could this be an issue that ram can cause issues where a server can shut down and it would lose it causing data backup issues


no


then I don't understand the issue ! this seems really great imo.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: