There are zero DL frameworks that support AMD cards as a primary target. Most ha...

tgtweak · on Dec 25, 2017

Amd had always had first class support for openCL, cuda is Nvidia proprietary and although Nvidia "supports" OpenCL, it's quite bad. Issue #22 on tensorflow is regarding openCL support.

AMD has done some interesting work on HCC (a more proper gpgpu compiler approach with llvm base) and that is showing promise. See here: https://instinct.radeon.com/wp-content/uploads/sites/4/2017/...

Additionally, they support cuda decompiling to HIP which is an intermediary that can be built to target Nvidia (via nvcc) or amd (via HCC).

Nvidia has built a lot of tooling for DL, such as cudnn, and the new Tesla cards have dedicated silicon for tensor calculation. Amd does have a cudnn equivalent called MlOpen. They have also ported caffe via HIP and it works well. Work is being done by amd right now to torch, mxnet and tensorflow to add support for amd hardware with minimal burden to the maintainers of these projects.

You can read about some of the DL toolkits available here: https://instinct.radeon.com/en/6-deep-learning-projects-amd-...

I think it's particularly bad form on behalf of everyone in the DL framework and library world to cator only to Nvidia and cuda, and that they very much walked into this shake down with open arms.

The original comment is correct in that contributing support for OpenCL (which works on mobile too) will alleviate this to a fair degree. It's one of those things where the more momentum is behind it, the more device manufactures will focus on ensuring their opencl compiler is building properly optimized kernels for their hardware.

Start contributing to OpenCL or adding hip support to existing projects and we'll see some viable alternatives pop up from not only AMD, but players like Qualcomm and Samsung.

pjmlp · on Dec 25, 2017

Until quite recently OpenCL was a C only game, so blame Khronos for not embracing all HPC relevant languages, like CUDA does.

nl · on Dec 25, 2017

OpenCL doesn't have an equivalent to CuDNN.

I'm not saying it's a great situation, I'm saying that NVidia has always had better libraries, tools and performance, and it isn't surprising that developers use them.

Deep learning is hard and slow enough without using second class tools.

hedgehog · on Dec 25, 2017

Much respect to NVIDIA and their software team but the situation is changing. PlaidML is like cuDNN for every GPU. Fully open source, faster in many cases than TF+cuDNN on NVIDIA, beats vendor tools on other architectures, Linux/Mac/Win. Supports Keras currently but more frameworks are not difficult (patches welcome).

argonaut · on Dec 25, 2017

The PlaidML benchmarks are suspect. They compare to Keras + Tensorflow, which is a really unfair comparison since 1) Tensorflow is probably the slowest of the big deep learning frameworks out there (compared to PyTorch, MXNet, etc.), and 2) Keras itself is quite slow. Keras is optimized more for ease of use, introduces lots of abstractions, and often doesn't take advantage of many TF optimizations, (for just one example until very recently Keras did not use TF's fused batch norm, which the TF docs claim provides a 10-30% speedup in overall network performance, which alone could be enough to account for many of the benchmarks showing PlaidML ahead).

hedgehog · on Dec 25, 2017

In my opinion it's extremely fair. The benchmarks are Keras+PlaidML compared to Keras+TensorFlow, it allows running exactly the same nets (just imported from the Keras included applications) and whatever penalty Keras might impose is equal in the two cases. Having one very direct comparison is actually why we constructed the tests that way (none of the other frameworks run on our high priority platforms).

That said we'd be pretty excited if someone wanted to add support for TF, PyTorch, MXNet, etc. We like Keras but are happy to have integrations for all frameworks. With work you could pair it with Docker and containerize GPU-accelerated workloads without the guests even needing to know what hardware it's running on. Lots of possibilities.

argonaut · on Dec 25, 2017

No, no, no.

> whatever penalty Keras might impose is equal in the two cases.

The penalty Keras imposes when using Tensorflow depends on its Tensorflow implementation. The penalty Keras imposes when using MXNet depends on its MXNet implementation. The penalty Keras imposes when using PlaidML depends on whatever the PlaidML devs implemented. When you build a Keras layer, it's calling different Keras code for each backend.

The comparison would be fair if Plaid claimed to be the fastest Keras backed, not if it were actually claiming to be faster than Tensorflow.

nl · on Dec 25, 2017

There was someone on reddit/ml who posted some pretty interesting numbers for training.

I think they have a lot of challenges ahead of them, but I’m still more optimistic about Plaid than AMD’s own efforts.

AMD says that they don’t care about ML[1], and their actions back that up.

Edit: and to be clear, I think comparing Keras+Plaid vs Keras+TF is an entirely valid thing to do. Lots of people work in Keras, and if you download a random NN code off github it likely to be Keras (or Pytorch now of course).

[1] https://www.reddit.com/r/MachineLearning/comments/66bgmf/com...

ekelsen · on Dec 25, 2017

PlaidML seems entirely geared towards inference (I only see batch size 1 anywhere). Training is important.

hedgehog · on Dec 25, 2017

Batch 1 inference on convnets is key for us internally but training does work pretty well. The underlying machinery can do much more. Here's a blog post that talks about how it works with some links to more detailed docs & the actual implementations:

http://vertex.ai/blog/tile-a-new-language-for-machine-learni...

Two of the big motivators for opening the code were 1) giving students taking the popular courses a way to get started with GPU in whatever machine they've got (recent Intel GPUs in say a MacBook Air are enough) and 2) giving researchers a platform where it's simple to add efficient GPU-accelerated ops.

For scale on #2 check out the entire implementation of convolution:

https://github.com/plaidml/plaidml/blob/master/plaidml/keras...

nl · on Dec 25, 2017

PlaidML is very promising I agree.

tgtweak · on Dec 25, 2017

> Amd does have a cudnn equivalent called MlOpen.

nl · on Dec 25, 2017

MIOpen could be great one day. ATM you still get random problems like this: https://github.com/ROCmSoftwarePlatform/MIOpen/issues/19

That's an order of magnitude worse performance than NVidia on ResNet 52 for 2 months with no real reason.

Great idea, but no one can use it reliably yet.

dnautics · on Dec 25, 2017

apparently Caffe has support for AMD cards through ROCm

https://rocm.github.io/dl.html

nl · on Dec 27, 2017

That's exactly what I mean.

It's a patchset (note that it isn't upstreamed in their chart), and it's caffe. That was great in 2015.