My recent experience has been that the Vulkan support in llama.cpp is pretty goo...

My recent experience has been that the Vulkan support in llama.cpp is pretty good. It may lag behind Cuda / Metal for the bleeding edge models if they need a new operator.

Try it out! Benchmarks here: https://github.com/ggml-org/llama.cpp/discussions/10879

(ollama doesn’t support vulkan for some weird reason. I guess they never pulled the code from llama.cpp)