It’s been hard to keep up with the evolution in LLMs. SOTA models basically change every other week, and each of them has its own quirks.
Differences in features, personality, output formatting, UI, safety filters… make it nearly impossible to migrate workflows between distinct LLMs. Even models of the same family exhibit strikingly different behaviors in response to the same prompt.
Still, having to find each model’s strengths and weaknesses on my own is certainly much better than not seeing any progress in the field. I just hope that, eventually, LLM providers converge on a similar set of features and behaviors for their models.
But it's also churning: I think it's more in the direction of you'll be more productive with a setup you've learnt the quirks of than the newest one which you haven't.
Each model has their own strength and weaknesses tho. You really shouldn’t be using one model for everything. Like, Claude is great at coding but is expensive so you wouldn’t use them for debugging to writing test benches. But the OpenAI models suck at architecture but are cheap, so are ideal for test benches, for example.
How important is it to be using SOTA? Or even jump on it already?
Feels a bit like when it was a new frontend framework every week. Didn't jump on any then. Sure, when React was the winner, I had a few months less experience than those who bet on the correct horse. But nothing I couldn't quickly catch up to.
I believe in using the best model for each use case. Since I’m paying for it, I like to find out which model is the best bang for my buck.
The problem is that, even when comparing models according to different use cases, better models eventually appear, and the models one uses eventually change as well — for better or worse. This means that using the same model over and over doesn’t seem like a good decision.
I'd love something like litellm, but simpler. I'm not provisioning models for my organization, I don't need to granularly track spend, I just want one endpoint to point every tool or client at for ease of configuration and curiosity around usage.
Differences in features, personality, output formatting, UI, safety filters… make it nearly impossible to migrate workflows between distinct LLMs. Even models of the same family exhibit strikingly different behaviors in response to the same prompt.
Still, having to find each model’s strengths and weaknesses on my own is certainly much better than not seeing any progress in the field. I just hope that, eventually, LLM providers converge on a similar set of features and behaviors for their models.