Fair question. Some of the supported models are large and wouldn't fit on most l...

sorenjan · 2025-09-25T20:16:14 1758831374

> Some of the supported models are large and wouldn't fit on most local devices.

Why would I use those models on your cloud instead of using Google's or Anthropic's models? I'm glad there are open models available and that they get better and better, but if I'm paying money to use a cloud API I might as well use the best commercial models, I think they will remain much better than the open alternatives for quite some time.

mchiang · 2025-09-25T20:23:27 1758831807

When we started Ollama, we were told how open-source (open-weight wasn't a term back then) will always be inferior to the close-sourced models. This was 2 years ago (Ollama's birthday is July 18th, 2023).

Fast forward to now, open models are quickly catching up, and at a significantly lower price point for most and can be customized for specific tasks instead of being general purpose. For general purpose models, absolutely the closed models are currently dominating.

typpilol · 2025-09-25T21:16:42 1758835002

Ya a lot of ppl don't realize you could spend 2k on a 5090 to run some of the large models.

Or spend 20 a month for models even a 5090 couldn't run. And not have to spend your own electricity, hardware, maintenance, updates etc.

oytis · 2025-09-25T21:38:16 1758836296

20 a month for a commercial model is price dumping financed by investors. For ollama it's hopefully a sustainable price.

theshrike79 · 2025-09-26T06:42:58 1758868978

The 20 a month models definitely aren't sustainable.

This is why everyone needs to get every flavour and speedrun building all the tools they need when the infinite money faucets are turned off.

At some point companies will start raising prices or moving towards per-token pricing (Which is sustainable, but expensive).

gunalx · 2025-09-26T14:14:55 1758896095

Depends. API pricing from oss model inference providers basically has to be sustainable, because of competition in the space.

And with that in mind, i definetly dont use more than a couple of bucks a month in API refils. (not that i really am a power user or anything)

So if you consider the 20 bucks to be balanced between poer and non power users, and with the existing rate limits, its probably not that far off being profitable, at least on the pure inference side.

ineedasername · 2025-09-25T20:30:31 1758832231

A person can use Google’s Gemma models on ollama’s cloud and possibly pay less. And have more quality control that way (and other types of control I guess) since there is no don’t need to wonder if a recent model update or load balance throttling impacted results. Your use case doesn’t generalize.

disiplus · 2025-09-25T20:38:20 1758832700

hi, to me this sounds like you are going into the direction of openrouter.