I have to say I disagree with you on the AMD/TensorFlow point here, Adam. Your previous points are completely valid. Nothing of note will happen in 2018, I expect. But by early 2019, maybe - if AMD (or Intel) gets their act together on the software side of things. I don't think the "community" will do it for them. Buy maybe a big player like Amazon will have had enough of Nvidia and support an open alternative. Although we support Cuda in our version of YARN (Hops Hadoop), I expect we'd add ROCm or OpenCL or whatever - if it was a serious alternative. That would happen quickly. The problem, of course, is that we would want great support in TensorFlow first before we do that. Data scientists need a seamless transition - including from a performance perspective. For us, that also means support for distributed deep learning (Ring AllReduce over infiniband). I don't expect that will happen in 2018, and could take until 2020, if i'm being realistic. That means when AMD finally get some good DL libraries, Nvidia will still have one-up on them with distributed DL (reduce training time linearly with more GPUs).
The other wildcard i haven't seen people mention here is the Neural Network Processor (NNP) from Intel Nervana. The hardware has potential. As long as the software doesn't force us to use BigDL, it has potential.
Re: AMD/Intel. We've been waiting for them to get their act together for years. Nervana could be great but I'm going to wait on that one. So far their "launches" have been nothing more than marketing fluff.
As for your projections about tensorflow. It won't be tensorflow. Tensorflow will be 1 of many frameworks. Look I like HOPS but you guys push tensorflow explicitly. A startup running its own hadoop distro that happens to push tensorflow isn't going to move the needle. You guys are great middleware I'm sure but I haven't seen the customers where it might be viable. I hope you guys continue to grow though! While you're doing that, most hadoop vendors are focused on moving up the stack. It will take people with actual resources to move the needle in terms of enterprise adoption.
Amazon is doing this with mxnet and EMR, MapR is pushing tensorflow in their serving. CNTK is being pushed in SQL server and HDInsights. There's some competition there.
What I'm getting at here is: it will take multiple vendors and competition. I'm going to place my bet on the bigger players already involved with the foundations first though.
Open standards (addressed below) where it commoditizes the chip will be key. The storage infra will follow from that.
It should be something that doesn't displace current infra but allows interop.
Things like nvvm from the mxnet folks, onnx (where the framework doesn't matter anymore!) being pushed by the various hardware vendors etc will move the needle. You need buy in from the actual big players who can upfront the development time in to making these things viable alternatives.
For your "seamless transition" I'm not sure that would be that hard done right. Supporting "great tensorflow" can come in multiple flavors. As a separate issue, tensorflow's production story is horrible. That's another topic I could rant about all day though. It ultimately comes from abstracting it away though. That by itself is a hard problem (Disclosure: I have my own solution for this that I won't talk about here just know I'm biased :D)
Lastly, I question whether opencl can even be a viable alternative. It's a fragmented inconsistent standard with a worse API than cuda. One reason it "won" is because it's in general cleaner and a clear leader in the space.
Yeah, ROCm is the most viable candidate as of today.
In general, Nvidia are not good for middleware vendors. They want to be one, but don't offer a platform that integrates with anything. Licensing costs for the DGX-1 are insanely high.
My problem is mostly from a data scientist perspective - teams don't need a few high performance GPUs, like a couple of DGX-1 boxes. They need a hundred 1080Tis, maybe complemented by a DGX-1 (which would cost the same a 2 DGX-1s). That way they can do lots of parallel experiments (hyperparam optimization), and distributed training. Making GPUs a scarce resource just re-inforces the lead of the hyperscale AI companies, who have thousands of GPUs available for their data scientists.
Oh I agree that GPUs should be more commodity. We might see alternative ASICs rather than GPUs come out though. I'm personally more interested in seeing that succeed than just confining the solution space to gpus and discrete gpu competition. I'm just not keen on trying to predict what will win (I really don't know) I just have criteria I would be looking for before trying to implement support for it in either my deep learning framework or trying to support anything for customers.
The other wildcard i haven't seen people mention here is the Neural Network Processor (NNP) from Intel Nervana. The hardware has potential. As long as the software doesn't force us to use BigDL, it has potential.