> Graviton4 processors deliver up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.
This seems ambiguous. Presumably this is 50% more cores per chip. What about "30% better compute performance" and "75% more memory bandwidth": is that per core, or per chip? If the latter, then per-core compute performance would actually be lower.
Also, "up to" could be hiding almost anything. Has anyone seen a source with clearer information as to how per-core application performance compares to earlier Graviton generations?
I would assume that "up to" means that for all of the workloads that they benchmarked the best result was 30% better compute performance. Not a very useful number as your workload is very unlikely to hit the right set of requirements to see that uplift.
Because AWS doesn't rent you chips, it rents you cores. If a chip has 50% more cores, that doesn't help you as a renter of cores, just means they get to rent out more cores.
Of course it might give Amazon room to offer these instances at a lower hourly rate per core, which would ultimately cash out as improved cost/performance for AWS customers.
At least in the "old days" there was ( still is ) a secondary market for used server parts..
Don't know how companies like Amazon, Microsoft and Google would frame a question like this so their "green" narratives wouldn't be hurt but I'm sure they'll do an excellent job.
They don't sell these. They reuse them and perform maintenance on them until their last breath and part them out once they die.
Hyperscalers design their own datacenter "SKUs" for storage/compute, all the way from power delivery to networking to chassis. These servers are going to be heavily customized and it's unlikely that even if they fit normal form factors that they will work in the same way as COTS devices or things you would buy from Supermicro.
You could possibly make it work. If they sold them. But they don't, and if you're in the market for that stuff, Supermicro will just design it for you anyway, because presumably you have actual money.
And the reality is they're probably either break even or greener doing it this way, as opposed to washing their hands of it and selling servers on Ebay so they can eventually get throw in landfills wholesale by nerds once their startups fail or they get bored of them. Just because you stick your head in the sand doesn't mean it doesn't end up in a landfill.
I haven’t heard about CPUs failing that often, though. Usually it’s some other part of the server that dies, like the motherboard. In that light, the grandparent’s question is still valid — normally these servers that “died” would be torn apart and the non-broken parts refurbished and resold on the aftermarket.
No, but I spat out a vague answer rather quickly and was too flippant ("maybe you could do something"), so it's a fair question. Realistically, even the motherboard design, including landing pad on the PCB and boot sequence of the chip, from the root of trust to initial firmware bringup, is going to be custom on systems like Graviton4. For example, these use the Nitro system, which exists as hardware, and it is a key point of the whole design. And AWS designs their services to even resist some level of operator compromise, e.g. an operator trying to exfiltrate secrets from the Nitro system, so the amount of people who can exert influence there is extremely limited. Individual parts like the CPU are as good as useless without the chassis (and power supply, and attached switch equipment) they belong to. Even if you had the whole thing, you might very well not be able to do anything with it, making it as good as a brick.
Even if Nitro was out of the picture or whatever, and you just had the raw package -- it's not like you can really make a motherboard magically from thin air for these devices based on just the CPU pinout, and the tolerances just for power delivery and memory buses are pretty tight, not to mention a gazillion other things.
More broadly, designing compute that is used purely in-house versus large-scale high-volume COTS designs, through e.g. OEM partners, is literally a difference of years and tens or hundreds of millions of dollars. Support, documentation, supply chain relationships, etc. These take a lot of money to do right, and when you buy servers, part of the purchase goes to those departments, to fund them. Most places are better off just talking to Supermicro if they actually need servers, for that reason. But hyperscalers literally save ridiculous amounts of money by doing it themselves and not doing the other things Supermicro does, like OEM work, support, and NRE on generalist designs that are useful outside to third parties.
If you haven't used aws a lot then you might not know this but the old instance types stick around and you can still use them, especially as "spot" which lets you bid for server time.
I had a science project which was cpu bound and it turns out because people bid based on the performance, the old chips end up costing the same in terms of cpu work done/$ (older chips cost less per hr but do less).
aws though was by far the most expensive so switching to like oracle with their ampere arm was a lot cheaper for me.
This - I've been seeing recent Fargate workloads mysteriously scaling due to high CPU even though there's no traffic.
I started logging the CPU as part of task start-up and I've seen five year old Xeons running my workloads.
The price is the same though, regardless of what I'm getting, and I wouldn't care, except in my non-prod environments everything runs fine on some class of processor, while on my prod environment things didn't run fine and my cluster was maxed out because it was running on some old processor.
I know that notionally it can and will run on different hardware in different environments, but if I can run a certain workload (idling at 10% in one environment), I expect to be able to do the same in another environment.
Depending on the numbers involved, previous generation hardware can waterfall to infrastructure apps that are throughput based.
Things accessed through network APIs and billed per op or in aggregate. Distributed file systems, databases, even build and regression suite systems.
Another key point is that older generations of servers for full custom cloud environments tend to co-evolve with their environments. The amount of power and cooling for a rack may not support a modern deployment.
Especially if a generation lasts 6 years. You might be able to cascade gen N+1 to N, but N+6 may require a full retrofit. A 6 year old data center that is partially filled as individual servers fail may justify waiting for N+7 or even 8 to cover the cost of the downtime and retrofit.
There is a reason Google announced that they are depreciating servers over 6 years and Meta is at 5 years, vs the old accounting standard of 3 years.
Then of course there is a secondary market for memory and standard PCI cards, but the market for 6 year old tech is mainly spares, so it is unlikely to absorb the full size of the N-6 year data center build.
If you are considering a refurb style resale market for 6 year old tech, it is often the case that the performance per dollar is a non-starter because of the amount of power the older tech consumes.
They just... don't retire them? The most expensive thing in a DC is the chips, so it's worth it to just build more datacenter space and keep the old ones around.
In 2019, before I left the EC2 Networking / VPC team, we were using M3 instances for our internal services... those machines were probably installed in 2013 or 2014, making them over 5 years old.
With the slowdown in Moore's law and chip speeds, I'd wager that team is still using those M3s now.
Eventually the machines actually start failing, so they need to be retired, but a large portion of machines likely make it to 10 years.
They for sure can find a use internally for them. Hat-tip to the less-shiny teams like glacier that have to endlessly put out fires on dilapidated old s3 compute/array handmedowns.
Not much to discuss until there is pricing. I have a bunch of Graviton2 instances that didn't make sense to upgrade to any Graviton3 instances due to pricing bump for 16GB 4 cores (t4g.xlarge).
Neoverse V2, so this will be probably be the first widely available ARMv9 server with SVE2, a server-class SKU you can actually get your hands on (i.e. not a mobile phone/Grace/Fugaku.) It's about damn time!
In my opinion the key takeaway is that compute is becoming commoditized much more rapidly than anyone expected, and that IP is becoming less and less relevant compared to fabrication, energy and land. For consumers of cloud infrastructure, there is little concern how many teraflops per cubic meter or per watt hour other than very locality-specific edge use-cases.
Laptops and phones are are already a SoC with IO in a particular form factor, and sever farms will go in the same direction with minor differences in the energy or rack density that come out in the wash.
It feels a bit weird with MS, AWS probably other cloud owners
develop their own CPUS and AI oriented chips and telling the world
about the specs.
Yet nobody will ever get noe to play with them in RL.
I cant hope to buy one a year from now and stuff it in my home office.
What will all this mean for consumer oriented cpus?
Would it be accurate to say that Intel funds part of the development
of consumer cpus with the server cpus? (or is it the other way around).
It seems like Xeon chip advances drip downwards after a while.
If AWS and Azure stop buying chips from Intel and AMD, presumably
that woud be interesting.
It is interesting how late the Cortex X3, ( Neoverse V3 or Graviton 4 ) arrives on server when the X4 is already being used and close to shipping in millions within months or weeks.
End of Next Year we will get 128 Core Neoverse V4 / Cortex X4 with 3nm. And 3nm Zen 5 EPYC.
Any guesses what the various chips on the package are?
I'd guess maybe the two directly abutting the core are memory controllers, but maybe they are the stacked memory? Maybe the top and bottom chips are io controllers? It felt like destiny that eventually Nitro was going to be on-package, maybe those are basically big honking nitro-like chips?
The scale they are quoting at 100,000 chip clusters and 65 exaflops seems impossible. At 800W per chip, that's 80MW of power! Unless they literally built an entire DC of these things, nobody is training anything on the entire cluster at once. It's probably 10-20 separate datacenters being combined for marketing reasons here.
It's about what the I though the H100 was, that's 700W actually. But even at say, 400W, that's 40MW of power. I guess some datacenters are built in the 40-100MW range from some quick googling, but I really doubt they actually can network 100,000 chips together in any sort of performant way, that's supercomputer level interconnect. I don't think most datacenters support highly interlinked network interconnect like this would need either.
They have instances with 16 chips so I presume there are at least 16 chips per server. I'd also expect the power consumption to be more like 100-200W given they seem more like Google's TPUs than a H100.
For the interconnect I doubt this is their typical interconnect but it doesn't seem completely unreasonable. Even when not running massive clusters they'll still need the interconnect to pair the random collections of machines that people are using.
Well, think of it this way -- individual 1U servers can easily consume 1000W, or 1kW. Put about forty of those in a single rack, and that's 40kW. Divide 80MW for the datacenter by 40kW per rack and that's not very many racks to comprise the entire datacenter, right?
The footprint for 2000 racks would be over 1000m2; when you add the necessary spacing as well as supplementary utilities (power/networking) that probably means double that footprint.
I guess at the scale those companies are operating it's not that big, but that's still quite a large building !
This seems ambiguous. Presumably this is 50% more cores per chip. What about "30% better compute performance" and "75% more memory bandwidth": is that per core, or per chip? If the latter, then per-core compute performance would actually be lower.
Also, "up to" could be hiding almost anything. Has anyone seen a source with clearer information as to how per-core application performance compares to earlier Graviton generations?