>> Do you have price-performance numbers you can share on that? Like compared ag...

>> Do you have price-performance numbers you can share on that? Like compared against local or cloud machines with RTX and A100 GPU’s?

Good question, the account is muddy --

1. Electricity is a parent company responsibility, so while that is a factor in OpEx price, it isnt a factor for us. I dont think it even gets submetered. Obviously, one wouldnt want to abuse this, but maxing out Macbooks dont seem close to abuse territory

2. The M1/M2/M3 machines are already purchased, so while that is major CapEx, it is a sunk cost and also an underutilized resource most of the day. We assume no wear and tear from maxing out the cores, not sure if that is a perfect assumption but good enough.

3. Local servers are out of the question at a big company outside of infra groups, it would take years to provision them and I dont think there is even a means to anymore.

The real question is cloud. Cloud with RTX/A100 would be far more expensive, though I'm sure performant. (TPM calculation left to the reader :-) I'd leave those for fine tuning, not for inference workloads. Non-production Inference is particularly bad because you cant easily justify reserved capacity without some constant throughput. If we could mix environments, it might make sense to go all cloud on NVIDIA but having separate environments with separate compliance requirements makes that hard.

Jokes aside, I think a TPM calculation would be worthwhile and perhaps I can do a quick writeup on this and submit to HN.