Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Vortex: OpenCL compatible RISC-V GPGPU (gatech.edu)
161 points by hasheddan on April 11, 2024 | hide | past | favorite | 52 comments


to clarify, this is a gpgpu lib implementation that's compliant to the openCL spec, right?

i imagine if i tried to explain to someone the state of modern GPU computing i'd sound like an insane person, why is it so complicated, ugh :(

so there's openGL and openCL, which are specifications with various implementations, both widely supported, but not as performant/modern

there are vulkan/metal/dx, which are all modern with good driver support (metal limited driver support) graphics-first APIs, with compute shaders (vulkan kompute, metal performance shaders and whatever dx has?), all three have excellent performance

then there is cuda/hip, both of which are proprietary, cuda w/ excellent driver support, and hip w/ reportedly awful driver support, cuda can only target nvidia gpus, and hip can target both cuda and amd gpus (though very few amd gpus), both have very good performance

then there is SYCL, which is by khronos, and it's what intel uses w/ openAPI, which is a heterogenous computing framework that can produce openCL/vulkan and in the future other api's code

then there's webgpu, which is a spec, that has two C and one rust implementations, that can target all of the above (except SYCL obv, bc it's not a graphics api)

did i get that right?


CPU side took a very long time to pare down the fragmentation. For the first ~50 years there were competing proprietary instruction sets and software stacks, languages and compilers and operating systems were vendor specific, buggy and mutually incompatible. Compilers were sold separately and expensive, apps were platform specific. x86 would have remained a monopoly if not for a licensing accident with Intel second-sourcing chips from AMD.

We're seeing a version of that replaying, with some parts better and some parts worse.

The GPU situation would be ripe for regulation to lift GPU software development from the dark ages, but companies who benefit most (NVidia) have probably become too big to regulate.


> too big to regulate.

I guess you are from the US and not EU? It's when they get too big and abuse their monopoly they will be regulated.

AFAIK nvidia have not done anything to abuse its power yet, but its hard for big companies to not use their monopoly in a illegal way since its always a good a Market strategy just illegal


interesting, hopefully same homonegisation will happen to gpu sphere as well :)


> i imagine if i tried to explain to someone the state of modern GPU computing i'd sound like an insane person, why is it so complicated, ugh :(

I mean, do you think this is different for anything else in our field? This was in some sense really simple... try doing this for front-end GUI frameworks or even multi-tenant isolation technologies.


my point of comparison is other types of accelerators (CPUs mostly) and GPU ecosystem in the past (when there was just like openCL and openGL)


I would add WebGL to the mix, for when you need that GPU performance in a browser that doesn't support WebGPU yet..


You missed GNM/GNMX, GX/GX2/NVN from Sony and Nintendo respectively, which tend to be forgotten on 3D API discussions.


That's because very few people have access to them and there's no information about them in the open.


Rather the urban myth that they fully support Khronos APIs persists.

To be fair there was some kind of support on the PS3 with GL ES 1.0/Cg shaders as secondary API, later dropped, Wii GX(2) shaders are based on GLSL, and while Switch does support GL 4.6 and Vulkan for porting purposes, it is either Nintendo extensions spaghetti time, or rewrite in NVN, for actually taking full advantage of the hardware.


mlx: a mix between numpy and cuda/hip for metal


You missed ROCm?


ROCm is a whole ecosystem. HIP falls under ROCm.


Could we get an expert to weigh in on the state of OpenCL. It seems like AMD and Intel are shifting away from OpenCL other GPGPU languages such as ROCm and DPC++.


Not sure if having two commercial OpenCL apps[0] makes me an expert, but the practical reality is that OpenCL works just great today, even on latest Apple platforms. It's not a lot of work to do a little testing and tuning on the 3 main vendors. News of OpenCL's demise is greatly exaggerated.

[0] https://glaretechnologies.com/


Did you figure out a way to interop OptiX with OpenCL, or do you just not use Nvidias RT acceleration? AFAIK it's not exposed directly through OpenCL.


Usually you batch up a large buffer of rays to be traced anyway, so it's not really a problem to hand this off to another API if need be. This is often the case on CPU too, where you would use something like Embree, but on GPU it'd be Vulkan RT or OptiX (we didn't do this for Indigo Renderer, however).


Well, my impression when I looked into OpenCl a few years ago was that it was certainly real and used for things but it was absolutely nothing like a Cuda replacement. Basically it's constructs are low level and processor specific - workable for graphic engines but not for general purpose gpu programming.

This is explains the rise of ROCm and DPC++ as systems having equivalent higher level constructs to Cuda.


SYCL is the second attempt at it, because only after got punched, seeing researchers rather adopt CUDA with its polyglot PTX, supporting C++, Fortran, and even subsets of Java, Haskell, .NET, did they come up with SPIR, and OpenCL C++ in OpenCL 2.0.

However almost no one cared, meaning AMD and Intel never shiped anything worth using, so OpenCL 3.0 is OpenCL 1.0 rebranded, and out of it, the C++ efforts were placed into SYSCL intead.

Which Intel picked up for their own DPC++ efforts, an additional tooling layer on top of SYSCL, meanwhile the only company selling usable SYCL developer experience is a former compiler vendor, that used to work with Sony in high performance compilers for the Playstation (Codeplay), as they pivoted away from console development.

Eventually Intel acquired Codeplay, and they are now the main supporters of One API tools, and the whole UXL efforts.

In the middle of all this, AMD decided to go with their own efforts.

It isn't only up to NVidia, when their competition can't get their story straight for decades.


Indeed,

If anything the situation is that Nvidia is far-sighted, developing hardware and software for general purpose GPU computing and the other chip manufacturers still think like traditional (non-CPU) chip manufacturers - they just want to sell a best-chip for a single purpose. This explain why they throw random things against the wall to see if they stick rather than choosing a general purpose and committing to it indefinitely.


What version of OpenCL do Apple platforms support? Also, is OpenCL also deprecated like OpenGL on Apple?


it's 1.2, macOS only, and marked as deprecated


People are moving away from OpenCL. The big problem is driver support. The transition from 2 to 3 was a mess, performance is all over the place on some functions between intel/amd/nvidia, and Apple is at the point of just leaving it out.

People have even started using Vulkan in its place for some GPGPU, because at least Vulkan has good drivers everywhere, being a low-level API.


While this is completely true, it is also true that OpenCL 1.2 is the one compute API that just works on every major platform and the drivers don't seem that unusably bad (though I'm not claiming experience of every platform here, just Nvidia on Windows/Linux and Apple Silicon on MacOS). Writing a limited dialect of C and sticking to 1.2 limitations is far from ideal, but it does at least work reliably. Sadly, that is more than can be said about most competitors.


Yes, 1.2 is OK.

The problem is that the drivers are merely OK. Presumably if you're using OpenCL you care about the performance (otherwise why would you??) and since that's the case, it's the best on no platforms, and there are alternatives for any set of platforms that do better.

I think OpenCL is sadly on its way out, and it's mostly Apple's fault (and Nvidia a little). Vulkan compute is much more interesting if you're looking to leverage iGPUs/mobile/other random CPUs.

If you're targetting workstations/server workloads only, it makes sense to restrict yourself to a subset of accelerator types and code for that (eg. Torch or JAX for GPUs, use highway for SIMD, etc.)


It depends on what you're doing. For writing FP32 number crunching code from scratch (meaning you don't care about something like Torch, or even cuBLAS/cuDNN), I haven't encountered cases where I couldn't match CUDA performance and if I did, I could always just use a bit of PTX assembly where absolutely necessary (which OpenCL lets you do, whereas Vulkan does not). This also gets me good performance on MacOS without rewriting the whole thing in Metal. There is no native FP16 support and there are other limitations that may matter to your usecase or be completely irrelevant.

I'm definitely not saying OpenCL is any sort of a reasonable default for cross platform GPGPU work. In truth, I don't think there is any reasonable "general" default for that sort of thing. Vulkan has its own issues (only works via a compatibility layer on MacOS, implementation quality varies widely, extension hell, boilerplate hell, some low level things are just impossible, etc.) and everything else is a higher level approach that can't work for everything by definition.

It's a pretty sad situation overall and every solution has severe tradeoffs. Personally, I just write CUDA when I can get away with it and try to stick to OpenCL otherwise, but everyone needs to make that choice for their own set of tradeoffs.


Yeah, TBH I'm kind of sad about where OpenCL ended up, because it "should have" been what CUDA was used for from 2011-2021. AlexNet, TF, Pytorch, etc. "should have" been written with OpenCL backends.

But the driver implementations inconsistency, version support issues, etc. meant people used CUDA instead.

I agree Vulkan has its own issues, and having written some MoltenVK stuff, you clearly know the quality-of-life pains in developping with it. That said, at least from the user side it works and performs well.


It is Intel, AMD and Google's fault for never supporting OpenCL as they should, and Khronos for pissing off Apple with how they took ownership of OpenCL.


DPC++ is Intel's own thing on top of SYCL, which relies on OpenCL drivers, basically long story short, what happened after Khronos realized shipping source code and C99 dialect wasn't really what most researchers were keen on using for their work.


OpenCL is the best damn compute API ever. I'd totally buy and use one of these GPUs if I could!


I guess, if one enjoys using C99 without any kind of nice tooling, calling GPU compiler and linker with APIs, printf debugging.


Indeed, I'm not a fan of C++28 with virtual functions, exceptions, new/delete and whatever in my GPU code :P

With a parallel CPU implementation I have pretty few debugging issues, and most importantly, I definitely enjoy having excellent performance on Intel, AMD and Nvidia GPUs, on Windows, Mac and Linux.


Templates would be nice, though. So much better than macros


I still only write C89.

Being able to declare variables anywhere can lead to some really nasty bugs especially when using goto.


there have been quite a few threads about "reasonably priced" FPGA platforms and https://github.com/vortexgpgpu/vortex/blob/master/docs/fpga_... will interest you, but regrettably I haven't ever tried to use an FPGA to speak to whether their verilog only targets one of them, or "bring your own," and for sure I'm not the right person to speak to how much such a stunt would cost

It would make for a helluva blog post, though, or set up a shopify selling pre-made ones for people like you :-)


Really? I found it to be much worse than CUDA when I tried to use it years ago.


Having used both, it is true that CUDA has the "enterprise edge" of being a proprietary, paid for product, backed by an (almost) trillion dollar company. There are features and conveniences in CUDA that aren't present in OpenCL.

That being said though, for how little funding, research, and support OpenCL receives, it is an astoundingly capable tool.

I don't think it's fair to compare apples and oranges, even if they're clearly both "fruit" with a "high sugar content".


It isn't as if the Intel and AMD enterprises ever managed to produce anything comparable to CUDA tooling.


Nvidia has brand recognition and momentum on their side. It's not that Apple or AMD are incapable.


AMD has more than proven its incapability to provide proper OpenCL tooling during the last 10 years.

Apple gave up on OpenCL, after disagreements how Khronos managed it.

Metal Compute provides a CUDA like development experience.


> Apple gave up on OpenCL, after disagreements how Khronos managed it.

Ah yes, the old "I can't completely control it, cause I'm Apple, so I'm taking my ball and leaving" ploy.

Though, I will admit, AMD was also playing this game at the same time, so a disagreement was bound to happen between them and Apple.

> Metal Compute provides a CUDA like development experience.

And there it is. Proving my prior statement, Apple didn't want an open compute platform, what they wanted was their own compute platform and it would have "been nice" if they could pretend it was "open" and they "shared" nicely with the other tech kids.

Before people come at me as an Apple hater, please know that there are many, many Apple devices in my home. I'm not an Apple hater, but I call it like I see it.


In what ways was it better/worse?


NVidia's OpenCL was fine, but AMD's OpenCL was an unholy dumpster fire of broken examples, missing error messages, and crashing drivers.


I used OpenCL a little in 2015. AMD's implementation seemed pretty good, whereas nVidia's was significantly behind. It didn't support OpenCL 1.2, iirc.


Where was AMD's implementation of Nsight's debugger or VS integration?


I'm pretty sure AMD's OpenCL did have both Visual Studio integration, and a decent debugger. I don't know about today.


I am pretty sure that AMD only has debugging tools for graphics API and never offered anything beyond printf debugging for OpenCL.

And VS integration means intellisense and syntax highlighting as well, not just calling their compiler.


AMD CodeXL supported breakpoints in OpenCL kernels, and had Visual Studio integration. It also had an editor for OpenCL C code.

[PDF] https://john.cs.olemiss.edu/heroes/papers/AMD_OpenCL_Program... See section 3.1 for debugging, section 4.2.2 for kernel code editor.



I'm excited for what processors and software look like in 5-10 years. Open all the way down.


So why need Wikipedia when this model can get me any topic I want?


You guys don’t like what’s coming for sure




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: