I am curious the need for 70 t/sec?

Aeolun · 2025-06-07T01:11:33 1749258693

Waiting minutes for your call to succeed is too frustrating?

ekianjo · 2025-06-07T04:57:58 1749272278

Depends entirely on the use case. Not every LLM workflow is a chatbot

jbellis · 2025-06-07T11:47:52 1749296872

no, but if you're not latency sensitive you should probably be using DeepSeek v3 (cheaper than flash, significantly smarter)

lostmsu · 2025-06-07T14:15:43 1749305743

What makes you believe DeepSeek is smarter than Flash 2.5? It is lower on all leaderboards.

jbellis · 2025-06-07T21:13:01 1749330781

you're right, I should clarify that I'm talking about no thinking mode, otherwise flash goes from "a bit more expensive than dsv3" to "10x more expensive"

cootsnuck · 2025-06-07T12:08:19 1749298099

High concurrency voice AI systems.

grepfru_it · 2025-06-10T00:33:00 1749515580

Why are you self hosting that?