That's wrong. Llama.cpp / Candle doesn't offer anything on the table that PyTorch cannot do (design wise). What they offer is smaller deployment footprint.
What's modern about LLM is the training infrastructure and single coordinator pattern, which PyTorch just started and inferior to many internal implementations: https://pytorch.org/blog/integration-idea-monarch/
What's modern about LLM is the training infrastructure and single coordinator pattern, which PyTorch just started and inferior to many internal implementations: https://pytorch.org/blog/integration-idea-monarch/