Also, I'm worried that they're at "engineering sample, glad it didn't crash" stage. Optimistically this will hit shelves at Q3, and at this point we may see sunny-cove-based 10nm (Whole chip, not only the "chiplet") cpu from the blue team.
I think AMD's success will come down to pricing...
You are comparing power for the entire system. I think they subtracted 55W of idle power consumption from both to get only the cpu, then compare: 75/125 = 0.6.
>"If we take a look at our average system idle power in our own reviews which is around 55W, this would make the Intel CPU around 125W, whereas the AMD CPU would be around 75W"
I have no idea what percent of idle power consumption the cpu should be responsible for, from their treatment I would assume negligible.
I wonder how AMD is planning long term. Zen did put them back on track, but I hope it's not a one shot era and that they will find ways to get strong for a decade or so.
Chiplets and power are the next dimension of competition. Yield, chiplets vastly improve yield. The limit as chiplet size drops, yield basically goes to 1. Erasure Coding for Silicon.
And then power, part of that is process and part is design. I am not skilled enough to call that, but as leakage current drops, then does power draw, static power draw, allowing for frequencies to scale over larger ranges. Hell, maybe went with some novel clockless design.
> “Some people may have noticed on the package some extra room,” she said with a chuckle. “There is some extra room on that package and I think you might expect we will have more than eight cores.”
Since Rome (their server part) uses 8 chiplets for 64 cores it makes sense.
When Rome was presented last year users also speculated about GPU and FPGA chiplets. Playstation 5 rumors also speculate about chiplets to keep the price to performance ratio as low as possible (though I wonder about latency which games are sensitiv to). So going all in on chiplets and reusing them for everything would make sense for AMD.
I don't disagree with your takeaway, but one nit: games are less sensitive to latency than they are to unexpected latency. You can deal with latency through a number of techniques so long as it's consistent. (IIRC, this is one of the reasons for some of the interesting hardware on the Xbox 360 shrinks - reconfiguring the CPU/GPU package meant that the hardware team had to gimmick it such that games didn't have to suddenly deal with less latency than they'd been intended for.)
No problem. Just consider it as a scheduling problem. If you know it takes N cycles to load a register (and to be clear, this is no longer something most game developers have to actively care about...though engine developers might...though on very modern consoles even this often ends up abstracted from most engine developers), having that register now be populated at (N-2) or (N+2) cycles might be Very Bad. On the other hand, if you now know that it will populate, always, in (N+4) cycles, you can just work with that.
Take this how you will, but his leakers have been very accurate in the past.
Seems possible to put 2 dies there with lower core counts to be able to use the chips (eg 4 + 4). As you say, it's also possible for 8 + 8, and probably 6 + 6 combinations. Also possible to put a GPU there for their G series.
Think of 3+1GB of scratch memory or L4, this could feed a GPU and not compete with main memory access. Or for notebook or game use, it would need less 100 active pins, vastly simplifying pcb design and costs.
There were some rumors before this announcement that AMD would announce SKUs with 1 CPU die (6-8 cores activated) + 1 GPU die or with 2 CPU dies (8-16 cores activated). But AMD didn't announce any SKUs - seems they're a bit behind on schedule.
They are launching processors in the middle of the year. If they reveal too much 4-6 months out, Intel has way too much time to adjust their prices, change their marketing, put pressure on suppliers, etc.
This leaves a lot of guesswork. Intel has to be prepared for multi-chip, chip + CPU, faster chips, slower chips (maybe this was a golden sample), etc which makes it much harder to adjust in preparation.
On /r/amd a user posted a different photo [0] taken from the stream which - to a layman like me - looks like there is circuitry (or whatever?) for a second chiplet.
What I'm salivating for is an AMD 6c/12t mobile chip. With this power draw on desktop, chop the thermals and boost clock a bit and you have something truelly frightening on the battery life & burstable performance scale.
Considering last generation's APUs used the same "chiplets" as the desktop processors, I'd wager the next generation of APUs will have a single 8c16t zen2 die.
AMD seems to be making a push for the server market at the moment too. It looks like a good time for it as Intel struggles to get production on 10nm while AMD is already at 7nm for some of their range. But Intel has an incumbency advantage. I'm no expert on the server market though. Could someone with a bit more knowledge chime in with a how big a threat this really is?
Mandatory reminder that TSMC "7nm" is not comparable to any Intel node and also doesn't actually represent a uniform 7nm gate width.
The technology and potential in Intels "10nm" and TSMCs "7nm" are similar. What matters is that Intel hasn't managed to successfully deploy a new process node in 4 years and thus competitors like TSMC, Samsung, and GF have had time to catch up and surpass Intels fab tech in various ways.
> I'm no expert on the server market though. Could someone with a bit more knowledge chime in with a how big a threat this really is?
Sys Admin here. People are starting to notice and raise eyebrows. It is certainly an option, especially for new stacks. The power/price/performance ratios on AMD are truly amazing.
It certainly helps that Amazon, Microsoft, Cray, and Baidu have all already started to use Epyc servers.
Student admin myself here at Uni. With Dell being our primary provider I did some general comparisons with the currently available EPYCs. Our Dual socket Skylakes (36 total cores without hyperthreading) with 128GB of ECC ram cost ~18K each. With an EPYC chip I could build 48+ core systems with 256GB RAM for ~3K less.
If these next gens have similar pricing, they look very competitive for the future of on-premise (and cloud) HPC stacks.
I was in your same place exactly 20 years ago when I convinced the dept head that we (mostly me) could build our own machines for the test lab instead of purchasing units from corp it. I should have productionized the building but otherwise it went pretty well.
1, you are right, you can build your own machines for way less than dell
2, you are now the support team. So when you get a call months, years later about hardware quirks, missing pieces or odd incompatibilities, expect it.
3, Buy extras and keep the machines largely the same. The fleet will be its own spare parts as it ages out. You will need extras for differential testing.
4, Keep a log and copy of all the bioses supplied by the motherboard manufacturer. Baseline and benchmark each server before it is put into production, retest as it is cycled out for other duties.
5, If you don't max out the memory when built, do track ram prices and there is a sweet spot of a dip just as the ram tech ages out. If you miss it, ram prices go back up again. An older machine with lots of ram will useful for a very long time.
If one plans it right, 100 node HPC cluster decays into a 75 node render farm into a 25 node distributed system test cluster, I think they could get 10 years of service out of the purchase.
I don’t disagree with any of that. I usually prefer building my own when I’m the maintainer/builder. But not my current situation, a few things to note:
* I have (hopefully) one semester left. After that, I’m gone.
* A good portion of our experienced team will be gone within the next two years, including the few grad students who helped build our cluster. Introducing “non-supported” hardware would likely introduce problems for the next group.
* The prices above were from Dell, not self assembly shopping. University policies mandate any university funded tech/infrastructure equipment (especially desktops/servers) be done through specific partners. For us, that’s Dell, GovConnection, and CDW. I host my own 6 node Supermicro Microblade setup (render farm) from CDW that took almost a full year to purchase because nobody in the Purchasing Dept. for clubs was aware of that rule. Until my hopping from Boxx to Thinkmate to “haha, funny story...” uncovered it.
> prices above were from Dell, not self assembly shopping
That is amazing. With those kind of prices folks might upgrade before their old hardware has properly aged out. At what point does AMD institute a buyback program for Intel hardware? (joking)
I don't see how this "incumbency advantage" would help Intel, exactly. The server market is highly competitive. If AMD puts out superior product at an attractive price, nobody is going to say "but we've used Intel for a decade, how can we suddenly switch to AMD?!".
Thats exactly what they would say. Reliability track record, existing software/project restraints and enterprise relationships matter more than the price tag. That's why Intel can still charge rediculous amounts for enterprise chips for some time.
I remember a time when Opteron had a very large part of the server market. It's about lack of competition, not about how businesses love Intel charging a fortune for their product.
Who's making chips to compete with Intel? AMD, IBM, and nobody else (ARM isn't high-performance yet). IBM's lock-in with POWER is scary to businesses. AMD wasn't competitive for most server needs. That left Intel to charge whatever they wanted (and they did).
With EPYC, there was finally a competitive offering at a great price (not to mention a better track record with meltdown and the constant stream of spectre variants). All the same software runs on the AMD chips too.
>nobody is going to say "but we've used Intel for a decade, how can we suddenly switch to AMD?!".
Maybe not those exact words, but people will say "We are adding a server to our VMWare ESXi cluster, so if want it to be compatible with our existing hardware we better stick with Intel."
VMWare's high availability feature (a highly desired feature for any data center) won't work across different CPU architectures, so unless you are replacing your entire stack and not just adding server(s) then you have to stick with the architecture you already have in place.
Indeed, AMD's best in would be at the larger scale end, where someone might add whole sites at a time and choose AMD for a new datacenter, since inter-site live vMotion is pretty unusual.
I have been a Sys Admin for 7 years. It is a problem, vmotion won't work going from AMD to Intel or vice versa (at least not in any official way). In fact it is highly suggested when building ESXi stacks that you get the exact same model CPU for each server.
No, Intel FlexMigration and AMD-V Extended Migration architectures are different, and vMotion isn't supported across them.
EVC does not allow for migration with vMotion between Intel and AMD processors.[1]
This isn't really that surprising. x86-64 is standard, but extensions aren't. Eg, here's some of the new extrnsion in Zen vs Steamroller: Compared to the AMD Opteron™ "Steamroller" EVC mode, this EVC mode exposes additional CPU features including RDRAND, SMEP, AVX2, BMI2, MOVBE, ADX, RDSEED, SMAP, CLFLUSHOPT, XSAVES, XSAVEC, SHA, and CLZERO [1]
A project that I currently work on procured new hardware and one of the constraints for the selection was that pretty much exactly the same hardware must be available in three years. So this one will be buying intel for the foreseeable future.
> nobody is going to say "but we've used Intel for a decade, how can we suddenly switch to AMD?!".
Actually, that is the case, it will take about 1-2 hardware generations of customers _asking_ for AMD before it's featured prominently by any vendor (except the extreme ones like SuperMicro)
If nobody is actually calling their HP/Dell reps asking for it then it will never happen.
Market perception matters though, and I know many people -- IT admins included -- who think that Intel is just higher quality. Those people will be hard to capture quickly, and AMD will need to hold a lead for a while to change that perception
Things aren't always that simple. For example AMD may be cheaper per core, but if you're running AVX heavy code then intel has a throughput advantage. At least it did with Zen, Zen2 has some improvements in that area, so we'll have to measure again.
Moving to the next intel generation is usually simple, incremental change. Moving between vendors means a lot of re-evaluating if you're crunching numbers.
If your workload needs a lot of PCIe lanes for IO or you need cheap systems with ECC ram then the story changes quite a bit.
I wouldn't say it's especially suited for it, but AMD generally does well because of the large amount of cores/threads.
On the other hand, something like 7Zip, AMD has had a huge advantage compared to Intel, even going back to the Bulldozer days. What is it about that code that runs so well on AMD?
Compute (rather than I/O) heavy and highly multithreaded stuff skews towards AMD. Most applications benefit more from Intel's caching and single-core turbo boost features than they do from AMD's more discrete style of multithreading and weaker single-core mode tech.
> Compute (rather than I/O) heavy and highly multithreaded stuff skews towards AMD.
This. It's worth noting that newer, legacy-free programming languages make it a lot more feasible to parallelize compute-heavy parts of the code, and to deal with the "Memory wall" (i.e. the rise of memory-bandwidth as a bottleneck especially in high-CPU-frequency, high-core-count systems) by working with efficient, low-level representations of data. This will help not just AMD, but also newer entrants in this market segment such as ARM vendors.
Do the rooflines of these chips show that pattern? Intel has a big vector unit with some seriously useful instructions in it, pushing the arithmetic speed up.
Yeah, but that's comparing a mid 2019 chip to a chip you can buy today. Intel will probably make some announcement about their chip that's faster than this one right about the time this hits the shelves. It's the neverending rat race for chipmakers.
Up to this point Intel has been competing with AMD by dramatically cutting their previously massive margins between what was possible on their tech and what they were milking consumers for with their enthusiast platforms.
Intel won't be able to compete with this with another one of their Arizona fabbed 14nm chips. The techs 8 years old now, but they still haven't demonstrated a readiness in anything newer yet. This might be the pants down moment Intel can't bounce back from a few months later.
The Intel playbook has, for the last 8 years, been either "same architecture, same clock speed, less power draw" (remember, Sandy Bridge [2011] and Broadwell [2014] perform within 10% of each other given the same clock speed- a consequence of chasing thin-and-light laptops and tablets) or "same architecture, higher clock speed, more power draw" after Zen hit store shelves. Instructions per clock haven't changed significantly in many years; Intel has just slowly gotten better at hitting higher clocks because smaller transistors pull less power.
Overclockers have known for years that 5GHz is about the maximum that Intel's architectures can do under standard conditions (i.e. not liquid nitrogen), and the bottleneck there has primarily been due to heat generation and not processor instability.
It hasn't escaped Intel that these kinds of clocks were easily possible- they had special CPUs sold at those maximum speeds for high-frequency trading applications (not sold to the general public), and that fact was being kept in reserve as a strategic advantage against potential competition.
But that advantage has already been used up (not that AMD has one at the moment either; Zen and Zen+ have practically zero overclocking headroom on their fastest examples). Intel will need a new architecture to compete, and that's still likely at least a year down the road- and unlike the last time this happened (the Pentium 4 was supposed to reach 5GHz, but capped out at 3.8 for heat and power draw reasons) they don't have another architecture waiting in the wings to save them.
"Realistically speaking, we should be able to see NetBurst based processors reach somewhere between 8 – 10GHz in the next five years before the architecture is replaced yet again. Reaching 2GHz isn’t much of a milestone, however reaching 8 – 10GHz begins to make things much more exciting than they are today. Obviously this 8 – 10GHz clock range would be based on Intel’s 0.07-micron process that is forecasted to debut in 2005. These processors will run at less than 1 volt, 0.85v being the current estimate."
And overclockers did actually push that chip all the way to 8GHz with extreme measures.
Core 2 then hit the reset button and clock speeds dropped substantially (from 3.8ghz to 3.0ghz) in exchange for huge boosts to IPC. Intel didn't exceed the P4's 3.8GHz clocks until nearly 10 years later with the 4.0ghz i7-4790k (AMD was actually the first to 4 ghz with the AMD FX-4170)
It's possible that Intel will announce a 10C Comet Lake but then AMD can quickly counter with a 16C Ryzen. Then Intel has no response unless they can cram an 18C -X die into an 115x socket.
Also, natively wider SIMD (256 vs 128 bit), though still no AVX512 IIRC. I'm waiting on third gen Threadrippers in particular. Now that will be one heck of a chip for a quad GPU deep learning workstation.
Intel does avx512 at 30+% slower clockspeeds. AMD should be able to do an avx512 at full clockspeeds in double the cycles.
For equivalent architectures, you'd expect Intel to be faster at workloads that are only avx512. They'd be slower at mixed workloads (the avx unit slows down, but so do all the other ALUs executing in parallel along with the SMT thread).
Most importantly, 512 is 256 more sets of hardware which is a big addition to core size and power usage for what is a very fringe workload. Saving that while getting better performance in common workloads seems like a great tradeoff.
I'm aware. But it's not just about clock speed per se. AVX512 has some _very_ specialized instructions that are specifically designed to speed up matrix-matrix and matrix-vector multiply. Even at a significantly slower clock those instructions improve things by a lot if you do linear algebra. Most of linalg is done on GPUs nowadays, but not all of it.
As a consequence, TR/Ryzen is very slow in "classical" ML in Python (I can tell), because MKL/AVX2 seems to be way slower. This could be fixed with Zen 2; AVX512 then will be likely at 2/3 of Intel's performance without slowing down the rest of the system during computations.
I agree with this assessment. Frankly, GPUs are so much faster for dense linalg than even the highest end Intel (or AMD) chips that anything that can be done on GPUs should be done there. But that requires one to know how to work with them, if the primitives don't already exist, and most people are oblivious to the finer points.
> This suggests that AMD’s new processors with the same amount of cores are offering performance parity in select benchmarks to Intel’s highest performing mainstream processor, while consuming a lot less power. Almost half as much power.
>
>How has AMD done this? IPC or Frequency?
Am I missing something? Wouldn't this be expected given that this new AMD processor is running on 7nm while Intel hasn't really launched in 10nm processors yet. AMD is a process generation ahead...
ROCm is AMD's "CUDA". TensorFlow is now up-to-date - but you have to work with AMD's fork of tensorflow, they haven't upstreamed their changes yet.
The new GPU of interest for DeepLearning is the MI60. It has 32GB of RAM, 1TB/s memory bandwidth.
Performance on fp32 looks slightly lower than the V100 (resnet50 on tensorflow, see below). But there are no tensorcores, so resnet50 would go 3x faster using the TensorCores. Personally, i think TensorCores are not that useful in the general case, as not enough programmers that i have come across are writing their code to use them. They are good, but nichey.
The important thing we don't know is the price. Expect it to be released by the end of Q1 or early Q2.
There's a lot of CUDA-specifc code out there. Even if you think you know exactly what you need, even if it claims to be compatible with AMD cards, you should seriously consider the possibility that you are wrong on one of those counts. I was, twice. Twice I had to sell my AMD cards, buy NVIDIA cards, and eat the spread. By comparison, just paying the "green tax" up front would have been cheap.
I want AMD to win, I really do, but I can't personally justify taking chances on them anymore.
> Question from a curious sysadmin: How are these AMD GPUs for machine learning and math related stuff?
They're just fine, though I think a tuned implementation on either tends to show NVIDIA ahead.
> I have only heard of Nvidia CUDA in this context. Does anybody do ML work on AMD GPUs?
NVIDIA is still king here, but if AMD continues to develop their Radeon Instinct stuff, it could be competitive for some applications; it then becomes a question of whether or not there will be enough people who know something other than CUDA (which is not currently supported on AMD GPUs, though work is under way to make a compatible runtime).
HIP (part of the ROCm stack) is Fairlane similar to CUDA. They also have goodies like tensorflow-rocm, available through pip or a docker container.
That stack is also open source. And writing in it is supposed to keep you compatible with NVidea.
Wanting to support them -- and hoping they become more competitive in ML -- are why I bought Vega GPUs.
Although, I haven't found much time to actually try and get any of my software running on a GPU, let alone optimized. As a huge fan of avx512, I would like to try -- maybe I can get even better performance on a graphics card.
But with AMD, the lack of support for the uninitiated is apparent. Not much in the way of existing software and resources like online tutorials. I'd like guides on how to optimize kernels, organizing wavefronts and discussing memory movement.
Maybe I haven't looked hard enough.
Especially painful in Julia, where almost half a year after 1.0, the only supported way to use GPUs on any library is with CUDANative. "Cross platform" GPUArrays.jl's open cl backend still hasn't been updated.
Means all my coding will be done in HIP. Which is fine.
nVidia provides way more support for the ML community than AMD, and the ML community reciprocates by buying nVidia hardware. You can use AMD stuff, but it means being on the bleeding edge and doing the debugging yourself instead of being able to ask the community for help (via Google searches).
So yes you can use AMD hardware for it, but you end up making a lot more work for yourself to do so, and for little to no benefit to yourself.
As of today, if installing pytorch w/ GPU option the assumption is you have a nvidia gpu. Same thing seems to be the case for Tensorflow (did not install tensorflow lately)
The first mover effect of Nvidia/CUDA should not be under-estimated.
I'd rather they release a better chipset with at least 2 PCIe 3s than come up with these fast I/O paths not even commercially usable today. Such agony.
Really? The mi60 uses the exact same chip as the Radeon VII, and it supports pcie4. They will gimp the Radeon VII in some places so they don't completely kill mi60 sales, so dropping pcie4 makes sense in isolation but is completely whack in the bigger picture. Doesn't the right hand talk to the left hand at AMD? Hopefully there's time to reverse this crazy decision.
The Zeppelin (Ryzen 1xxx) die is 212 mm2 and its CCXs are 88 mm2 leaving 124 mm2 for the uncore. Now we see that the Matisse (Ryzen 3xxx) IO die is 122 mm2 — virtually the same size on the same process for the same functionality. (I don't see how L4 cache would fit in there BTW.)
As to why it's so large, I guess connecting cores, memory, and PCIe at extremely high speed just requires a lot of transistors. Intel's uncore seems to be far smaller; I'm not sure how or why.
The Ryzen chips offer more PCIe lanes and ECC memory support. These chips also use the Infinity Fabric high-speed interconnect, which is not needed by Intel's single-chip designs. The two CCX units in Zeppelin also need to talk to each other, so probably all this complexity just adds up.
I thought the main reason is that AMD still needs to use GF's manufacturing capacity (according to their Wafer Supply Agreement) and it also makes the chiplet design cheaper (the high-performance node is only used where it really matters).
That could also be true. I remember from an adoredtv video it was mentioned that IO transistors are much harder to scale down. The guy certainly seems to do his research and know his stuff, but I suppose at the end of the day it is a youtube video. I have not checked into this myself, just taking adoredtv's word at face value.
I think that was poetic license on his part. Mobile SoCs and modems are scaling quite well on 7nm. The IO chip is already low enough power for desktop/server applications and is small enough for good yield. 7nm would just be more cost for no gain. And using 14nm allows AMD to meet the WSA. Otherwise they probably would have done a TSMC 16nm IO die and kept their entire supply chain in Asia.
Well, it's on a more mature node, which at least should mean better yields. It also probably doesn't benefit from a node shrink as much as the actual CPU, so throw it on old node so it's cheaper kind of deal.
EDIT: I got some stuff mixed up. This post is all wrong.
It's supposedly one IO fits all. It has to interconnect up to 8 CPU chips and 4-8 memory channels (and allegedly some L4 cache). It has to support 128 PCIe lanes (maybe more for the upcoming generation) too. Then also, it's 14nm, so things aren't as small.
I imagine that they laser out most of the die to save power on consumer chips.
Currently devices have Thunderbolt 3 (up to 40 Gbps), I'm wondering if PCIe 4.0 will increase that. I assume it would be called Thunderbolt 4, and I assume it will be faster.
Increase it how? That's like asking if faster ethernet is going to increase thunderbolt speed. No, it's a totally different way of sending data over wires.
A theoretical Thunderbolt 4 that's twice as fast could easily be fed with a PCIe 3 connection. It could even be fed with a PCIe 2 connection!
External GPUs and Asics. Also, this is a connection that can be shared and the more is probably the better since we all probably have notebooks and they all gonna not have most ports we need.
Even the previous generations of Epyc are surprisingly cheap compared to Intel - Azure has a L8s_v2 8 core with 64 GB RAM and decent storage for under £100/month, which is less than half the price of anything else. Obviously, core speed may not suit every application, but it's pretty eye-opening!
Not released AMD CPU performs a bit worse than already released Intel CPU. They are still behind. But it's very intriguing. If they managed to push 5+ GHz, it would be awesome CPU.
Price/Perf goes to AMD every time, and increasingly often; some parallel workloads can simply perform better on AMD CPUs.
I work on The Division 2 (My managing director was on stage during the keynote) and we get better performance in quite a lot of area's using AMD CPUs.
Interesting fact: AMD EPYCs in our DC run with less power consumption per compute than our Xeon Skylake servers. What I later learned was that Intel takes the average power consumption and lists it, where AMD lists the max. Not sure why that is.
Well, if they can turbo at 5 GHz while consuming nearly half as much power as the equivalent Intel part... it sure does sound like something that I might buy.
The article body shows the AMD chip slightly behind the i9. The chart just after it shows two different figures - "pre-brief" it's behind, "on stage" it's ahead. Not a huge margin in any case.
The rumors are that this is AMD's mid-range, $200 part. Matching the performance of Intel's just-released, high-end $500 part at ~half the power usage. That's not being behind. That'd be a full generational improvement in just 6-9 months.
What an architecture Zen is.