Gemini flash or any free LLM on openrouter would be orders of magnitude faster and effectively free. Unless you are concerned about privacy of the conversation - it's really purely being able to say you did it locally.
I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.
I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.