Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're getting downvoted but what you said is true. The cost of self-hosting (and achieving +70 tok/sec consistently across the entire context window) has never been low enough to justify open source as a viable competitor to proprietary models of OpenAI, Google, and Anthropic.



I am curious the need for 70 t/sec?


Waiting minutes for your call to succeed is too frustrating?


Depends entirely on the use case. Not every LLM workflow is a chatbot


no, but if you're not latency sensitive you should probably be using DeepSeek v3 (cheaper than flash, significantly smarter)


What makes you believe DeepSeek is smarter than Flash 2.5? It is lower on all leaderboards.


you're right, I should clarify that I'm talking about no thinking mode, otherwise flash goes from "a bit more expensive than dsv3" to "10x more expensive"


High concurrency voice AI systems.


Why are you self hosting that?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: