Have you looked into semantic router? It will be a faster way to look up the rig...

sReinwald · 2025-05-08T16:55:39 1746723339

Semantic router is on my radar, but I haven't had a good look at it yet. The primary bottleneck in our current setup, isn't really the routing decision time. The lightweight LLM I chose (Gemma3 4B) handles the task identification fairly well in terms of both speed and accuracy from what I've found.

For some context: this is a fairly limited exploratory deployment which runs alongside other priority projects for me, so I'm not too obsessed with optimizing the decision-making time. Those three seconds are relatively minor when compared with the 20–60 seconds it takes to unload the old and load a new model.

I can see semantic router being really useful in scenarios built around commercial, API-accessed models, though. There, it could yield significant cost savings by, for example, intelligently directing simpler queries to a less capable but cheaper model instead of the latest and greatest (and likely significantly more expensive) model users might feel drawn to. You're basically burning money if you let your employees use Claude 3.7 to format a .csv file.