AMD Ryzen 3rd Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0

oliveshell · on Jan 9, 2019

Wow, performance competitive with i9-9900K at nearly half the power draw would be huge if it turns out to be true. Can’t wait for benchmarks.

What an architecture Zen is.

kissiel · on Jan 9, 2019

130W on AMD vs 180W Intel. Nearly 75%.

Also, I'm worried that they're at "engineering sample, glad it didn't crash" stage. Optimistically this will hit shelves at Q3, and at this point we may see sunny-cove-based 10nm (Whole chip, not only the "chiplet") cpu from the blue team.

I think AMD's success will come down to pricing...

nonbel · on Jan 9, 2019

You are comparing power for the entire system. I think they subtracted 55W of idle power consumption from both to get only the cpu, then compare: 75/125 = 0.6.

cm2187 · on Jan 10, 2019

But isn’t part of idle consumption due to the cpu?

mdorazio · on Jan 10, 2019

Not very much. For example i9 uses under 13 watts at idle [1].

[1] https://www.tomshardware.com/reviews/intel-core-i9-7900x-sky...

nonbel · on Jan 10, 2019

It's just a guesstimate. From TFA:

>"If we take a look at our average system idle power in our own reviews which is around 55W, this would make the Intel CPU around 125W, whereas the AMD CPU would be around 75W"

I have no idea what percent of idle power consumption the cpu should be responsible for, from their treatment I would assume negligible.

chodeboy · on Jan 9, 2019

Intel has been flashing the 10nm for 2+ years now. If anyone Intel should be worried if the can keep their schedule.

agumonkey · on Jan 9, 2019

I wonder how AMD is planning long term. Zen did put them back on track, but I hope it's not a one shot era and that they will find ways to get strong for a decade or so.

sitkack · on Jan 10, 2019

Chiplets and power are the next dimension of competition. Yield, chiplets vastly improve yield. The limit as chiplet size drops, yield basically goes to 1. Erasure Coding for Silicon.

And then power, part of that is process and part is design. I am not skilled enough to call that, but as leakage current drops, then does power draw, static power draw, allowing for frequencies to scale over larger ranges. Hell, maybe went with some novel clockless design.

Either way, this is momentous.

agumonkey · on Jan 10, 2019

I didn't see any news about leakage being resolved, do you have an article ?

clockless and low power.. exciting

sitkack · on Jan 10, 2019

It was supposition based on the power differences, I only do armchair solid state physics.

agumonkey · on Jan 10, 2019

better than bedroom solid state physics I guess

TwoNineA · on Jan 9, 2019

Interesting, according to Anandtech, there might be space for something else on the package:

https://images.anandtech.com/doci/13829/cpu44.jpg

Maybe a second Zen2 die?

tutanchamun · on Jan 9, 2019

> “Some people may have noticed on the package some extra room,” she said with a chuckle. “There is some extra room on that package and I think you might expect we will have more than eight cores.”

Taken from: https://www.pcworld.com/article/3332205/amd/amd-ceo-lisa-su-...

tutanchamun · on Jan 9, 2019

Since Rome (their server part) uses 8 chiplets for 64 cores it makes sense.

When Rome was presented last year users also speculated about GPU and FPGA chiplets. Playstation 5 rumors also speculate about chiplets to keep the price to performance ratio as low as possible (though I wonder about latency which games are sensitiv to). So going all in on chiplets and reusing them for everything would make sense for AMD.

eropple · on Jan 9, 2019

I don't disagree with your takeaway, but one nit: games are less sensitive to latency than they are to unexpected latency. You can deal with latency through a number of techniques so long as it's consistent. (IIRC, this is one of the reasons for some of the interesting hardware on the Xbox 360 shrinks - reconfiguring the CPU/GPU package meant that the hardware team had to gimmick it such that games didn't have to suddenly deal with less latency than they'd been intended for.)

tutanchamun · on Jan 9, 2019

thanks for the insight, I only ever read "games are sensitive to latency" but not exactly in what way.

eropple · on Jan 10, 2019

No problem. Just consider it as a scheduling problem. If you know it takes N cycles to load a register (and to be clear, this is no longer something most game developers have to actively care about...though engine developers might...though on very modern consoles even this often ends up abstracted from most engine developers), having that register now be populated at (N-2) or (N+2) cycles might be Very Bad. On the other hand, if you now know that it will populate, always, in (N+4) cycles, you can just work with that.

jandrese · on Jan 9, 2019

Probably for an iGPU for the budget model all in one chips.

hajile · on Jan 9, 2019

Take this how you will, but his leakers have been very accurate in the past.

Seems possible to put 2 dies there with lower core counts to be able to use the chips (eg 4 + 4). As you say, it's also possible for 8 + 8, and probably 6 + 6 combinations. Also possible to put a GPU there for their G series.

https://www.youtube.com/watch?v=PCdsTBsH-rI

soulnothing · on Jan 9, 2019

On reddit there have been a lot of leaks. Pointing to either 12c or 16c variants. Additionally room for the gpu, and maybe some hbm.

sitkack · on Jan 10, 2019

> maybe some hbm

Think of 3+1GB of scratch memory or L4, this could feed a GPU and not compete with main memory access. Or for notebook or game use, it would need less 100 active pins, vastly simplifying pcb design and costs.

DCKing · on Jan 9, 2019

There were some rumors before this announcement that AMD would announce SKUs with 1 CPU die (6-8 cores activated) + 1 GPU die or with 2 CPU dies (8-16 cores activated). But AMD didn't announce any SKUs - seems they're a bit behind on schedule.

hajile · on Jan 9, 2019

They are launching processors in the middle of the year. If they reveal too much 4-6 months out, Intel has way too much time to adjust their prices, change their marketing, put pressure on suppliers, etc.

This leaves a lot of guesswork. Intel has to be prepared for multi-chip, chip + CPU, faster chips, slower chips (maybe this was a golden sample), etc which makes it much harder to adjust in preparation.

okl · on Jan 9, 2019

Also, if they reveal too much prospective buyers might be incentivized to delay their purchase. As stupid as that might sound, it is a thing.

mbroncano · on Jan 9, 2019

It even has a name: Osborne effect

https://en.m.wikipedia.org/wiki/Osborne_effect

monocasa · on Jan 9, 2019

Probably just keeping the chiplet orientation wrt to the iochip (as compared to their bigger products) simplify design and validation.

You'd see the vias for the empty chip if it were planned to share the interposer among chip combination SKUs.

tutanchamun · on Jan 9, 2019

On /r/amd a user posted a different photo [0] taken from the stream which - to a layman like me - looks like there is circuitry (or whatever?) for a second chiplet.

[0] https://i.imgur.com/gzXaoJz.png

Symmetry · on Jan 9, 2019

Or integrated graphics.

berbec · on Jan 9, 2019

What I'm salivating for is an AMD 6c/12t mobile chip. With this power draw on desktop, chop the thermals and boost clock a bit and you have something truelly frightening on the battery life & burstable performance scale.

ric129 · on Jan 9, 2019

By "mobile" I'm guessing you mean laptops, and not phones.

Energy consumption of AMD's x86-64 CPUs are only impressive when comparing with Intel, they're not competitive with current ARM CPUs.

ben-schaaf · on Jan 10, 2019

Considering last generation's APUs used the same "chiplets" as the desktop processors, I'd wager the next generation of APUs will have a single 8c16t zen2 die.

berbec · on Jan 10, 2019

Ooh.. A 3500G, 6c with Vega 15 (or whatever) would be nice!

ineedasername · on Jan 9, 2019

AMD seems to be making a push for the server market at the moment too. It looks like a good time for it as Intel struggles to get production on 10nm while AMD is already at 7nm for some of their range. But Intel has an incumbency advantage. I'm no expert on the server market though. Could someone with a bit more knowledge chime in with a how big a threat this really is?

zanny · on Jan 9, 2019

Mandatory reminder that TSMC "7nm" is not comparable to any Intel node and also doesn't actually represent a uniform 7nm gate width.

The technology and potential in Intels "10nm" and TSMCs "7nm" are similar. What matters is that Intel hasn't managed to successfully deploy a new process node in 4 years and thus competitors like TSMC, Samsung, and GF have had time to catch up and surpass Intels fab tech in various ways.

hitpointdrew · on Jan 9, 2019

> I'm no expert on the server market though. Could someone with a bit more knowledge chime in with a how big a threat this really is?

Sys Admin here. People are starting to notice and raise eyebrows. It is certainly an option, especially for new stacks. The power/price/performance ratios on AMD are truly amazing.

It certainly helps that Amazon, Microsoft, Cray, and Baidu have all already started to use Epyc servers.

mroche · on Jan 10, 2019

Student admin myself here at Uni. With Dell being our primary provider I did some general comparisons with the currently available EPYCs. Our Dual socket Skylakes (36 total cores without hyperthreading) with 128GB of ECC ram cost ~18K each. With an EPYC chip I could build 48+ core systems with 256GB RAM for ~3K less.

If these next gens have similar pricing, they look very competitive for the future of on-premise (and cloud) HPC stacks.

sitkack · on Jan 10, 2019

I was in your same place exactly 20 years ago when I convinced the dept head that we (mostly me) could build our own machines for the test lab instead of purchasing units from corp it. I should have productionized the building but otherwise it went pretty well.

1, you are right, you can build your own machines for way less than dell

2, you are now the support team. So when you get a call months, years later about hardware quirks, missing pieces or odd incompatibilities, expect it.

3, Buy extras and keep the machines largely the same. The fleet will be its own spare parts as it ages out. You will need extras for differential testing.

4, Keep a log and copy of all the bioses supplied by the motherboard manufacturer. Baseline and benchmark each server before it is put into production, retest as it is cycled out for other duties.

5, If you don't max out the memory when built, do track ram prices and there is a sweet spot of a dip just as the ram tech ages out. If you miss it, ram prices go back up again. An older machine with lots of ram will useful for a very long time.

If one plans it right, 100 node HPC cluster decays into a 75 node render farm into a 25 node distributed system test cluster, I think they could get 10 years of service out of the purchase.

mroche · on Jan 10, 2019

I don’t disagree with any of that. I usually prefer building my own when I’m the maintainer/builder. But not my current situation, a few things to note:

* I have (hopefully) one semester left. After that, I’m gone.

* A good portion of our experienced team will be gone within the next two years, including the few grad students who helped build our cluster. Introducing “non-supported” hardware would likely introduce problems for the next group.

* The prices above were from Dell, not self assembly shopping. University policies mandate any university funded tech/infrastructure equipment (especially desktops/servers) be done through specific partners. For us, that’s Dell, GovConnection, and CDW. I host my own 6 node Supermicro Microblade setup (render farm) from CDW that took almost a full year to purchase because nobody in the Purchasing Dept. for clubs was aware of that rule. Until my hopping from Boxx to Thinkmate to “haha, funny story...” uncovered it.

sitkack · on Jan 10, 2019

> prices above were from Dell, not self assembly shopping

That is amazing. With those kind of prices folks might upgrade before their old hardware has properly aged out. At what point does AMD institute a buyback program for Intel hardware? (joking)

_m96l · on Jan 9, 2019

I don't see how this "incumbency advantage" would help Intel, exactly. The server market is highly competitive. If AMD puts out superior product at an attractive price, nobody is going to say "but we've used Intel for a decade, how can we suddenly switch to AMD?!".

d3v · on Jan 9, 2019

Thats exactly what they would say. Reliability track record, existing software/project restraints and enterprise relationships matter more than the price tag. That's why Intel can still charge rediculous amounts for enterprise chips for some time.

hajile · on Jan 9, 2019

I remember a time when Opteron had a very large part of the server market. It's about lack of competition, not about how businesses love Intel charging a fortune for their product.

Who's making chips to compete with Intel? AMD, IBM, and nobody else (ARM isn't high-performance yet). IBM's lock-in with POWER is scary to businesses. AMD wasn't competitive for most server needs. That left Intel to charge whatever they wanted (and they did).

With EPYC, there was finally a competitive offering at a great price (not to mention a better track record with meltdown and the constant stream of spectre variants). All the same software runs on the AMD chips too.

hitpointdrew · on Jan 9, 2019

>nobody is going to say "but we've used Intel for a decade, how can we suddenly switch to AMD?!".

Maybe not those exact words, but people will say "We are adding a server to our VMWare ESXi cluster, so if want it to be compatible with our existing hardware we better stick with Intel."

VMWare's high availability feature (a highly desired feature for any data center) won't work across different CPU architectures, so unless you are replacing your entire stack and not just adding server(s) then you have to stick with the architecture you already have in place.

ocdtrekkie · on Jan 9, 2019

Indeed, AMD's best in would be at the larger scale end, where someone might add whole sites at a time and choose AMD for a new datacenter, since inter-site live vMotion is pretty unusual.

neuromancer2701 · on Jan 9, 2019

These aren't "different CPU architectures". They would all be x86-64 so unless VMware uses some specific Intel feature it shouldn't be a problem.

hitpointdrew · on Jan 9, 2019

I have been a Sys Admin for 7 years. It is a problem, vmotion won't work going from AMD to Intel or vice versa (at least not in any official way). In fact it is highly suggested when building ESXi stacks that you get the exact same model CPU for each server.

https://communities.vmware.com/thread/305792 https://www.v-front.de/2013/04/how-to-vmotion-from-intel-to-...

nl · on Jan 10, 2019

No, Intel FlexMigration and AMD-V Extended Migration architectures are different, and vMotion isn't supported across them.

EVC does not allow for migration with vMotion between Intel and AMD processors.[1]

This isn't really that surprising. x86-64 is standard, but extensions aren't. Eg, here's some of the new extrnsion in Zen vs Steamroller: Compared to the AMD Opteron™ "Steamroller" EVC mode, this EVC mode exposes additional CPU features including RDRAND, SMEP, AVX2, BMI2, MOVBE, ADX, RDSEED, SMAP, CLFLUSHOPT, XSAVES, XSAVEC, SHA, and CLZERO [1]

[1] https://kb.vmware.com/s/article/1003212

Xylakant · on Jan 9, 2019

A project that I currently work on procured new hardware and one of the constraints for the selection was that pretty much exactly the same hardware must be available in three years. So this one will be buying intel for the foreseeable future.

I don’t consider that an unusual constraint.

dijit · on Jan 9, 2019

> nobody is going to say "but we've used Intel for a decade, how can we suddenly switch to AMD?!".

Actually, that is the case, it will take about 1-2 hardware generations of customers _asking_ for AMD before it's featured prominently by any vendor (except the extreme ones like SuperMicro)

If nobody is actually calling their HP/Dell reps asking for it then it will never happen.

wmf · on Jan 9, 2019

Fortunately that already happened; HPE and Dell have Epyc servers that are comparable quality to their Xeon servers.

12298765 · on Jan 9, 2019

Market perception matters though, and I know many people -- IT admins included -- who think that Intel is just higher quality. Those people will be hard to capture quickly, and AMD will need to hold a lead for a while to change that perception

zokier · on Jan 9, 2019

It doesn't help that Ryzen had few relatively high-profile issues:

https://news.ycombinator.com/item?id=13924192 https://bugzilla.kernel.org/show_bug.cgi?id=196683 https://www.phoronix.com/scan.php?page=news_item&px=Ryzen-Se... https://utcc.utoronto.ca/~cks/space/blog/linux/RyzenMachineL...

Not implying that Intel chips are flawless (still bitter about Haswell TSM), but as you said, market perception matters.

the8472 · on Jan 9, 2019

Things aren't always that simple. For example AMD may be cheaper per core, but if you're running AVX heavy code then intel has a throughput advantage. At least it did with Zen, Zen2 has some improvements in that area, so we'll have to measure again.

Moving to the next intel generation is usually simple, incremental change. Moving between vendors means a lot of re-evaluating if you're crunching numbers.

If your workload needs a lot of PCIe lanes for IO or you need cheap systems with ECC ram then the story changes quite a bit.

bluedino · on Jan 9, 2019

>> Cinebench is an idealized situation for AMD

I wouldn't say it's especially suited for it, but AMD generally does well because of the large amount of cores/threads.

On the other hand, something like 7Zip, AMD has had a huge advantage compared to Intel, even going back to the Bulldozer days. What is it about that code that runs so well on AMD?

entropicdrifter · on Jan 9, 2019

Compute (rather than I/O) heavy and highly multithreaded stuff skews towards AMD. Most applications benefit more from Intel's caching and single-core turbo boost features than they do from AMD's more discrete style of multithreading and weaker single-core mode tech.

zozbot123 · on Jan 9, 2019

> Compute (rather than I/O) heavy and highly multithreaded stuff skews towards AMD.

This. It's worth noting that newer, legacy-free programming languages make it a lot more feasible to parallelize compute-heavy parts of the code, and to deal with the "Memory wall" (i.e. the rise of memory-bandwidth as a bottleneck especially in high-CPU-frequency, high-core-count systems) by working with efficient, low-level representations of data. This will help not just AMD, but also newer entrants in this market segment such as ARM vendors.

wbl · on Jan 9, 2019

Do the rooflines of these chips show that pattern? Intel has a big vector unit with some seriously useful instructions in it, pushing the arithmetic speed up.

mikenew · on Jan 9, 2019

> AMD generally does well because of the large amount of cores/threads

True, but in this case they're comparing an 8C/16T chip against the also 8C/16T i9-9900K, so this is a pretty apples-to-apples comparison.

bratao · on Jan 9, 2019

"Identical Performance to the Core i9-9900K, At Just Over Half The Power" This looks like trouble for Intel!

jandrese · on Jan 9, 2019

Yeah, but that's comparing a mid 2019 chip to a chip you can buy today. Intel will probably make some announcement about their chip that's faster than this one right about the time this hits the shelves. It's the neverending rat race for chipmakers.

zanny · on Jan 9, 2019

Up to this point Intel has been competing with AMD by dramatically cutting their previously massive margins between what was possible on their tech and what they were milking consumers for with their enthusiast platforms.

Intel won't be able to compete with this with another one of their Arizona fabbed 14nm chips. The techs 8 years old now, but they still haven't demonstrated a readiness in anything newer yet. This might be the pants down moment Intel can't bounce back from a few months later.

qball · on Jan 9, 2019

I'm not so sure about that.

The Intel playbook has, for the last 8 years, been either "same architecture, same clock speed, less power draw" (remember, Sandy Bridge [2011] and Broadwell [2014] perform within 10% of each other given the same clock speed- a consequence of chasing thin-and-light laptops and tablets) or "same architecture, higher clock speed, more power draw" after Zen hit store shelves. Instructions per clock haven't changed significantly in many years; Intel has just slowly gotten better at hitting higher clocks because smaller transistors pull less power.

Overclockers have known for years that 5GHz is about the maximum that Intel's architectures can do under standard conditions (i.e. not liquid nitrogen), and the bottleneck there has primarily been due to heat generation and not processor instability.

It hasn't escaped Intel that these kinds of clocks were easily possible- they had special CPUs sold at those maximum speeds for high-frequency trading applications (not sold to the general public), and that fact was being kept in reserve as a strategic advantage against potential competition.

But that advantage has already been used up (not that AMD has one at the moment either; Zen and Zen+ have practically zero overclocking headroom on their fastest examples). Intel will need a new architecture to compete, and that's still likely at least a year down the road- and unlike the last time this happened (the Pentium 4 was supposed to reach 5GHz, but capped out at 3.8 for heat and power draw reasons) they don't have another architecture waiting in the wings to save them.

kllrnohj · on Jan 9, 2019

> the Pentium 4 was supposed to reach 5GHz

Somewhat minor nit but it was actually supposed to reach 10GHz: https://www.anandtech.com/show/680/6

"Realistically speaking, we should be able to see NetBurst based processors reach somewhere between 8 – 10GHz in the next five years before the architecture is replaced yet again. Reaching 2GHz isn’t much of a milestone, however reaching 8 – 10GHz begins to make things much more exciting than they are today. Obviously this 8 – 10GHz clock range would be based on Intel’s 0.07-micron process that is forecasted to debut in 2005. These processors will run at less than 1 volt, 0.85v being the current estimate."

And overclockers did actually push that chip all the way to 8GHz with extreme measures.

Core 2 then hit the reset button and clock speeds dropped substantially (from 3.8ghz to 3.0ghz) in exchange for huge boosts to IPC. Intel didn't exceed the P4's 3.8GHz clocks until nearly 10 years later with the 4.0ghz i7-4790k (AMD was actually the first to 4 ghz with the AMD FX-4170)

wmf · on Jan 9, 2019

It's possible that Intel will announce a 10C Comet Lake but then AMD can quickly counter with a 16C Ryzen. Then Intel has no response unless they can cram an 18C -X die into an 115x socket.

fleetfox · on Jan 9, 2019

It looks like Intel 10nm desktop CPUs are going to be available only in 2020.

m0zg · on Jan 9, 2019

Also, natively wider SIMD (256 vs 128 bit), though still no AVX512 IIRC. I'm waiting on third gen Threadrippers in particular. Now that will be one heck of a chip for a quad GPU deep learning workstation.

hajile · on Jan 9, 2019

Intel does avx512 at 30+% slower clockspeeds. AMD should be able to do an avx512 at full clockspeeds in double the cycles.

For equivalent architectures, you'd expect Intel to be faster at workloads that are only avx512. They'd be slower at mixed workloads (the avx unit slows down, but so do all the other ALUs executing in parallel along with the SMT thread).

Most importantly, 512 is 256 more sets of hardware which is a big addition to core size and power usage for what is a very fringe workload. Saving that while getting better performance in common workloads seems like a great tradeoff.

m0zg · on Jan 9, 2019

I'm aware. But it's not just about clock speed per se. AVX512 has some _very_ specialized instructions that are specifically designed to speed up matrix-matrix and matrix-vector multiply. Even at a significantly slower clock those instructions improve things by a lot if you do linear algebra. Most of linalg is done on GPUs nowadays, but not all of it.

bitL · on Jan 9, 2019

As a consequence, TR/Ryzen is very slow in "classical" ML in Python (I can tell), because MKL/AVX2 seems to be way slower. This could be fixed with Zen 2; AVX512 then will be likely at 2/3 of Intel's performance without slowing down the rest of the system during computations.

m0zg · on Jan 10, 2019

I agree with this assessment. Frankly, GPUs are so much faster for dense linalg than even the highest end Intel (or AMD) chips that anything that can be done on GPUs should be done there. But that requires one to know how to work with them, if the primitives don't already exist, and most people are oblivious to the finer points.

nicoburns · on Jan 9, 2019

> This suggests that AMD’s new processors with the same amount of cores are offering performance parity in select benchmarks to Intel’s highest performing mainstream processor, while consuming a lot less power. Almost half as much power. > >How has AMD done this? IPC or Frequency?

Am I missing something? Wouldn't this be expected given that this new AMD processor is running on 7nm while Intel hasn't really launched in 10nm processors yet. AMD is a process generation ahead...

stoobs · on Jan 10, 2019

The processes are measured differently, so 7nm isn't physically 3nm smaller than 10nm in this case.

I believe they said in one of the articles that more details on how they achieved the power saving will be forthcoming nearer the launch date.

reacharavindh · on Jan 9, 2019

Question from a curious sysadmin: How are these AMD GPUs for machine learning and math related stuff?

I have only heard of Nvidia CUDA in this context. Does anybody do ML work on AMD GPUs?

Does the popular libraries like PyTorch, TensorFlow etc support both CUDA and AMD's equivalent?

jamesblonde · on Jan 9, 2019

ROCm is AMD's "CUDA". TensorFlow is now up-to-date - but you have to work with AMD's fork of tensorflow, they haven't upstreamed their changes yet.

The new GPU of interest for DeepLearning is the MI60. It has 32GB of RAM, 1TB/s memory bandwidth. Performance on fp32 looks slightly lower than the V100 (resnet50 on tensorflow, see below). But there are no tensorcores, so resnet50 would go 3x faster using the TensorCores. Personally, i think TensorCores are not that useful in the general case, as not enough programmers that i have come across are writing their code to use them. They are good, but nichey.

The important thing we don't know is the price. Expect it to be released by the end of Q1 or early Q2.

Reference: https://wccftech.com/amd-radeon-mi60-resnet-benchmarks-v100-...

jjoonathan · on Jan 9, 2019

There's a lot of CUDA-specifc code out there. Even if you think you know exactly what you need, even if it claims to be compatible with AMD cards, you should seriously consider the possibility that you are wrong on one of those counts. I was, twice. Twice I had to sell my AMD cards, buy NVIDIA cards, and eat the spread. By comparison, just paying the "green tax" up front would have been cheap.

I want AMD to win, I really do, but I can't personally justify taking chances on them anymore.

microcolonel · on Jan 9, 2019

> Question from a curious sysadmin: How are these AMD GPUs for machine learning and math related stuff?

They're just fine, though I think a tuned implementation on either tends to show NVIDIA ahead.

> I have only heard of Nvidia CUDA in this context. Does anybody do ML work on AMD GPUs?

NVIDIA is still king here, but if AMD continues to develop their Radeon Instinct stuff, it could be competitive for some applications; it then becomes a question of whether or not there will be enough people who know something other than CUDA (which is not currently supported on AMD GPUs, though work is under way to make a compatible runtime).

celrod · on Jan 9, 2019

HIP (part of the ROCm stack) is Fairlane similar to CUDA. They also have goodies like tensorflow-rocm, available through pip or a docker container.

That stack is also open source. And writing in it is supposed to keep you compatible with NVidea.

Wanting to support them -- and hoping they become more competitive in ML -- are why I bought Vega GPUs.

Although, I haven't found much time to actually try and get any of my software running on a GPU, let alone optimized. As a huge fan of avx512, I would like to try -- maybe I can get even better performance on a graphics card.

But with AMD, the lack of support for the uninitiated is apparent. Not much in the way of existing software and resources like online tutorials. I'd like guides on how to optimize kernels, organizing wavefronts and discussing memory movement. Maybe I haven't looked hard enough.

Especially painful in Julia, where almost half a year after 1.0, the only supported way to use GPUs on any library is with CUDANative. "Cross platform" GPUArrays.jl's open cl backend still hasn't been updated. Means all my coding will be done in HIP. Which is fine.

jandrese · on Jan 9, 2019

nVidia provides way more support for the ML community than AMD, and the ML community reciprocates by buying nVidia hardware. You can use AMD stuff, but it means being on the bleeding edge and doing the debugging yourself instead of being able to ask the community for help (via Google searches).

So yes you can use AMD hardware for it, but you end up making a lot more work for yourself to do so, and for little to no benefit to yourself.

sremani · on Jan 9, 2019

note: Not DL practitioner.

As of today, if installing pytorch w/ GPU option the assumption is you have a nvidia gpu. Same thing seems to be the case for Tensorflow (did not install tensorflow lately)

The first mover effect of Nvidia/CUDA should not be under-estimated.

vkaku · on Jan 10, 2019

Meanwhile ... their Radeon VII is still on PCIe 3

I'd rather they release a better chipset with at least 2 PCIe 3s than come up with these fast I/O paths not even commercially usable today. Such agony.

bryanlarsen · on Jan 10, 2019

Really? The mi60 uses the exact same chip as the Radeon VII, and it supports pcie4. They will gimp the Radeon VII in some places so they don't completely kill mi60 sales, so dropping pcie4 makes sense in isolation but is completely whack in the bigger picture. Doesn't the right hand talk to the left hand at AMD? Hopefully there's time to reverse this crazy decision.

erdewit · on Jan 9, 2019

Does anyone know why the IO-die is so large? Does it contain L3 cache or something?

wmf · on Jan 9, 2019

The Zeppelin (Ryzen 1xxx) die is 212 mm2 and its CCXs are 88 mm2 leaving 124 mm2 for the uncore. Now we see that the Matisse (Ryzen 3xxx) IO die is 122 mm2 — virtually the same size on the same process for the same functionality. (I don't see how L4 cache would fit in there BTW.)

As to why it's so large, I guess connecting cores, memory, and PCIe at extremely high speed just requires a lot of transistors. Intel's uncore seems to be far smaller; I'm not sure how or why.

pmarcelll · on Jan 9, 2019

The Ryzen chips offer more PCIe lanes and ECC memory support. These chips also use the Infinity Fabric high-speed interconnect, which is not needed by Intel's single-chip designs. The two CCX units in Zeppelin also need to talk to each other, so probably all this complexity just adds up.

Tuldok · on Jan 9, 2019

It's 14nm. Also, it was speculated to have L4 cache.

hitpointdrew · on Jan 9, 2019

The IO specific transistors don't scale down well, so they left it at 14nm.

pmarcelll · on Jan 9, 2019

I thought the main reason is that AMD still needs to use GF's manufacturing capacity (according to their Wafer Supply Agreement) and it also makes the chiplet design cheaper (the high-performance node is only used where it really matters).

hitpointdrew · on Jan 10, 2019

That could also be true. I remember from an adoredtv video it was mentioned that IO transistors are much harder to scale down. The guy certainly seems to do his research and know his stuff, but I suppose at the end of the day it is a youtube video. I have not checked into this myself, just taking adoredtv's word at face value.

deagle50 · on Jan 10, 2019

I think that was poetic license on his part. Mobile SoCs and modems are scaling quite well on 7nm. The IO chip is already low enough power for desktop/server applications and is small enough for good yield. 7nm would just be more cost for no gain. And using 14nm allows AMD to meet the WSA. Otherwise they probably would have done a TSMC 16nm IO die and kept their entire supply chain in Asia.

mackal · on Jan 10, 2019

Well, it's on a more mature node, which at least should mean better yields. It also probably doesn't benefit from a node shrink as much as the actual CPU, so throw it on old node so it's cheaper kind of deal.

hajile · on Jan 9, 2019

EDIT: I got some stuff mixed up. This post is all wrong.

It's supposedly one IO fits all. It has to interconnect up to 8 CPU chips and 4-8 memory channels (and allegedly some L4 cache). It has to support 128 PCIe lanes (maybe more for the upcoming generation) too. Then also, it's 14nm, so things aren't as small.

I imagine that they laser out most of the die to save power on consumer chips.

wmf · on Jan 9, 2019

No, the Rome Epyc IO die (435 mm2) is not the same as the Matisse Ryzen IO die (122 mm2).

sigi45 · on Jan 10, 2019

I was looking to buy a 9700k and wasn't considering amd for probably 8 years.

Even with Nvidia and Amd, amd came out and than I bought a 1060.

I might just wait this time for amd.

reiichiroh · on Jan 10, 2019

A Geforce GTX 1060 GPU?

nolok · on Jan 10, 2019

That does seem rather weak given the cpu, I would expect at least a 1070. But then again price were inflated at least by a third for a good while...

sigi45 · on Jan 10, 2019

Yes. But the CPU is in the same Maschine but primarily for Lightroom.

yazr · on Jan 9, 2019

Whats the cache-coherency throughput of AMD vs Intel?

Isn't this a major, major issue with >=8 cores ?!

EDIT: yes, of course, lots of server code has very low memory contention.

gigatexal · on Jan 9, 2019

I’m hoping to get a steak on the 2xxx lines. These 3xxx chips if they bench well should be awesome. Can’t wait to see reviews.

_dp9d · on Jan 10, 2019

I assume PCIe 4.0 will support the next-generation of Thunderbolt?

Dylan16807 · on Jan 10, 2019

I'm not sure what you mean. It should be straightforward to connect a thunderbolt bridge to any version of PCIe.

If you specifically mean "without going over 4 lanes", then that's probably true.

_dp9d · on Jan 10, 2019

Currently devices have Thunderbolt 3 (up to 40 Gbps), I'm wondering if PCIe 4.0 will increase that. I assume it would be called Thunderbolt 4, and I assume it will be faster.

Dylan16807 · on Jan 10, 2019

Increase it how? That's like asking if faster ethernet is going to increase thunderbolt speed. No, it's a totally different way of sending data over wires.

A theoretical Thunderbolt 4 that's twice as fast could easily be fed with a PCIe 3 connection. It could even be fed with a PCIe 2 connection!

_dp9d · on Jan 13, 2019

Huh, thanks.

I had always thought Thunderbolt was essentially "external PCIe".

wmf · on Jan 10, 2019

Personally I would prefer Thunderbolt 3 to get cheaper instead of a new generation that stays expensive. Who needs 80 Gbps Thunderbolt 4 now and why?

mmrezaie · on Jan 14, 2019

External GPUs and Asics. Also, this is a connection that can be shared and the more is probably the better since we all probably have notebooks and they all gonna not have most ports we need.

sitkack · on Jan 10, 2019

This is going to own cloud. Costs are linear with power.

stoobs · on Jan 10, 2019

Even the previous generations of Epyc are surprisingly cheap compared to Intel - Azure has a L8s_v2 8 core with 64 GB RAM and decent storage for under £100/month, which is less than half the price of anything else. Obviously, core speed may not suit every application, but it's pretty eye-opening!

throwaway2021 · on Jan 10, 2019

I'll never buy AMD again, I had nothing but problems with Raven Ridge on Linux:

https://bugzilla.kernel.org/show_bug.cgi?id=196683

jtl999 · on Jan 10, 2019

Yeah. I'm concerned about similar issues too.

throwaway2021 · on Jan 10, 2019

No need to be concerned, it's working fine with Arch Linux (Linux 4.20, mesa 18.3).

It's still a problem with Ubuntu 18.04.1 (kernel 4.15), but Ubuntu 18.04.2 will hopefully be more stable with kernel 4.18.

vbezhenar · on Jan 9, 2019

Not released AMD CPU performs a bit worse than already released Intel CPU. They are still behind. But it's very intriguing. If they managed to push 5+ GHz, it would be awesome CPU.

dijit · on Jan 9, 2019

What are you comparing to what?

Price/Perf goes to AMD every time, and increasingly often; some parallel workloads can simply perform better on AMD CPUs.

I work on The Division 2 (My managing director was on stage during the keynote) and we get better performance in quite a lot of area's using AMD CPUs.

Interesting fact: AMD EPYCs in our DC run with less power consumption per compute than our Xeon Skylake servers. What I later learned was that Intel takes the average power consumption and lists it, where AMD lists the max. Not sure why that is.

AlphaSite · on Jan 10, 2019

I think AVX512 really drives up the max power, which isn’t relevant for most workloads.

rukittenme · on Jan 9, 2019

You mean slightly better. The AMD CPU outperformed the 9900k. Wit the same core count. Which means AMD was likely turboing to 5GHz during the demo.

vardump · on Jan 9, 2019

Well, if they can turbo at 5 GHz while consuming nearly half as much power as the equivalent Intel part... it sure does sound like something that I might buy.

sp332 · on Jan 9, 2019

The article body shows the AMD chip slightly behind the i9. The chart just after it shows two different figures - "pre-brief" it's behind, "on stage" it's ahead. Not a huge margin in any case.

dralley · on Jan 9, 2019

Power consumption was significantly better. I'd say they're still ahead.

mikenew · on Jan 9, 2019

It has almost identical performance at a little over half the power consumption. They're ahead.

kllrnohj · on Jan 9, 2019

The rumors are that this is AMD's mid-range, $200 part. Matching the performance of Intel's just-released, high-end $500 part at ~half the power usage. That's not being behind. That'd be a full generational improvement in just 6-9 months.