Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

here's a weird calculation:

this cluster does something vaguely like 0.8 gigabits per second per watt (1 terabyte/s * 8 bits per byte * 1024 gb per tb / 34 nodes / 300 watts

a new mac mini (super efficient arm system) runs around 10 watts in interactive usage and can do 10 gigabits per second network, so maybe 1 gigabit per second per watt of data

so OP's cluster, back of the envelope, is basically the same bits per second per watt that a very efficient arm system can do

I don't think running tiny nodes would actually get you any more efficiency, and would probably cost more! performance per watt is quite good on powerful servers now

anyway, this is all open source software running on off-the-shelf hardware, you can do it yourself for a few hundred bucks




You're comparing one machine with many machines.

You're comparing raw disks with shards and erasure encouraging.

Lastly, you're comparing only network bandwidth and not storage capacity.


I think the Mac Mini has massively more compute than needed for this kind of work. It also has a power supply, and computer power supplies are generally not amazing at low output.

I’m imagining something quite specialized. Use a low frequency CPU with either vector units or even DMA engines optimized for the specific workloads needed, or go all out and arrange for data to be DMAed directly between the disk and the NIC.


> or go all out and arrange for data to be DMAed directly between the disk and the NIC.

Ceph OSDs do a lot more work than you're imagining.


sounds like a DPU (mellanox bluefield for example), they're entire ARM systems with a high speed NIC all on a PCIe card, I think the bluefield ones can even directly interface over the bus to nvme drives without the host system involved


That Bluefield hardware looks neat, although it also sounds like a real project to program it :).

I can imagine two credible configurations for high efficiency:

1. A motherboard with a truly minimal CPU for bootstrapping but a bit beefy PCIe root complex. 32 lanes to the DPU and a bunch of lanes for NVMe. The CPU doesn’t touch the data at all. I wonder if anyone makes a motherboard optimized like this — a 64-lane mobo with a Xeon in it would be quite wasteful but fine for prototyping I suppose.

2. Wire up the NVMe ports directly to the Bluefield DPU, letting the DPU be the root complex. At least 28 of the lanes are presumably usable for this or maybe even all 32. It’s not entirely clear to me that the Bluefield DPU can operate without a host computer, though.


I checked selling prices of those racks + top end SSDs, this 1Tb/s achievement runs on $4 million worth of hardware cluster. Or more I didn't check the networking interface costs.

But yeah could run on commodity hardware. Not sure those highly efficient arm packaged for a premium from Apple would beat the Dell racks though regarding throughput relative to hardware investment costs.


Dell’s list prices have essentially nothing to do with the prices that any competent buyer would actually pay, especially when storage is involved. Look at the prices of Dell disks, which are nothing special compared to name brand disks of equal or better spec and much lower list price.

I don’t know what discount large buyers get, but I wouldn’t be surprised if it’s around 75%.


Agreed and the specs in the story in fact show they didn't provision add-ons such as specific SSDs from dell.

Still well over $1M for the cluster.. skeletons of racks with just CPUs and ram.


Trusting your maths, damn Apple did a great job on their M design.


Didn't ARM (the company, that originally designed ARM processors) do most of that job and Apple pushed perf to consumption even further?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: