The funny thing about Cerebras is that it doesn't scale well at all for inference and if you talk to them in person, they are currently making all their money on training workloads.
Inference is still a lot faster on CUDA than on CPU. It's fine if you run it at home or on your laptop for privacy, but if you're serving those models at any scale, you're going to be using GPUs with CUDA.
Inference is also a much smaller market right now, but will likely be overtaken later as we have more people using the models than competing to train the best one.
Keep NVIDIA for training and Intel/AMD/Cerebras/… for interference.