I had no idea they had their own cloud offering, I thought the whole point of Ol...

mchiang · 2025-09-25T19:57:45 1758830265

Fair question. Some of the supported models are large and wouldn't fit on most local devices. This is just the beginning, and Ollama does not need to exclude cloud hosted frontier models either with the relationship we've built with the model providers. We just have to be mindful and understand that Ollama stands with developers, and solve the needs.

https://ollama.com/cloud

sorenjan · 2025-09-25T20:16:14 1758831374

> Some of the supported models are large and wouldn't fit on most local devices.

Why would I use those models on your cloud instead of using Google's or Anthropic's models? I'm glad there are open models available and that they get better and better, but if I'm paying money to use a cloud API I might as well use the best commercial models, I think they will remain much better than the open alternatives for quite some time.

mchiang · 2025-09-25T20:23:27 1758831807

When we started Ollama, we were told how open-source (open-weight wasn't a term back then) will always be inferior to the close-sourced models. This was 2 years ago (Ollama's birthday is July 18th, 2023).

Fast forward to now, open models are quickly catching up, and at a significantly lower price point for most and can be customized for specific tasks instead of being general purpose. For general purpose models, absolutely the closed models are currently dominating.

typpilol · 2025-09-25T21:16:42 1758835002

Ya a lot of ppl don't realize you could spend 2k on a 5090 to run some of the large models.

Or spend 20 a month for models even a 5090 couldn't run. And not have to spend your own electricity, hardware, maintenance, updates etc.

oytis · 2025-09-25T21:38:16 1758836296

20 a month for a commercial model is price dumping financed by investors. For ollama it's hopefully a sustainable price.

theshrike79 · 2025-09-26T06:42:58 1758868978

The 20 a month models definitely aren't sustainable.

This is why everyone needs to get every flavour and speedrun building all the tools they need when the infinite money faucets are turned off.

At some point companies will start raising prices or moving towards per-token pricing (Which is sustainable, but expensive).

gunalx · 2025-09-26T14:14:55 1758896095

Depends. API pricing from oss model inference providers basically has to be sustainable, because of competition in the space.

And with that in mind, i definetly dont use more than a couple of bucks a month in API refils. (not that i really am a power user or anything)

So if you consider the 20 bucks to be balanced between poer and non power users, and with the existing rate limits, its probably not that far off being profitable, at least on the pure inference side.

ineedasername · 2025-09-25T20:30:31 1758832231

A person can use Google’s Gemma models on ollama’s cloud and possibly pay less. And have more quality control that way (and other types of control I guess) since there is no don’t need to wonder if a recent model update or load balance throttling impacted results. Your use case doesn’t generalize.

disiplus · 2025-09-25T20:38:20 1758832700

hi, to me this sounds like you are going into the direction of openrouter.

kordlessagain · 2025-09-25T22:14:02 1758838442

You make an account to use their hosted models AND to have them available via the Ollama API LOCALLY. I'm spending $100 on Claude and $200 on GPT5, so $20 bucks is NOTHING and totally worth having access to:

Qwen3 235b

Deepseek 3.1 671b (thinking and non thinking)

Llama 3.1 405b

GPT OSS 120b

Those are hardly "small inferior models".

What is really cool is that you can set Codex up to use Ollama's API and then have it run tools on different models.

brabel · 2025-09-26T11:06:58 1758884818

How does it compare to AzureAI which has all the best models and you don’t need to sign up with anyone other than Azure itself?

mrheosuper · 2025-09-26T06:30:51 1758868251

If you are on $100 tier Claude, what makes you think the $20 Tier Ollama is enough for you ?

theshrike79 · 2025-09-26T06:45:03 1758869103

If your workflow is general enough, you can (and should) switch between models. They all have different styles and blind spots.

Like I had Codex + gpt-5-codex (20€ tier) build me a network connectivity monitor for my very specific use case.

It worked, but had some really weird choices. Gave it to Claude Code (20€ tier again) and it immediately found a few issues and simplifications.

kordlessagain · 2025-09-26T16:52:01 1758905521

Right. And then there's using an MCP tool that instantiates another agent except uses a different model.

Here's a good example. For summarization of a page of content. Content is maybe pulled down by an agentic crawler, so using a local model to summarize is great. It's fast, doesn't cost anything (or much) and I can run it without guardrails as it doesn't represent a cost risk if it ran out of control.

kordlessagain · 2025-09-26T16:43:24 1758905004

Clearly articulated and repeating what makes the $20 Ollama tier valuable to me is:

1. Access to specific large open models (Qwen3 235b, Deepseek 3.1 671b, Llama 3.1 405b, GPT OSS 120b)

2. Having them available via the Ollama API LOCALLY

3. The ability to set up Codex to use Ollama's API for running tools on different models

I mean, really, nothing else is even close at this point and I would rather eat a bug than use Microsoft's cloud.

n4bz0r · 2025-09-26T01:00:34 1758848434

Has anyone tried the hosted models? How do they compare to GPT-5?

I was thinking about trying ChatGPT Pro, but I seem to have completely missed that they bumped the price from $100 to $200. It was $100 just a while ago, right? Before GPT-5, I assume.

pama · 2025-09-26T14:43:01 1758897781

No it was never $100 for ChatGPT Pro.

dcreater · 2025-09-25T20:39:22 1758832762

Yeah it's been a steady pivot to profitable features. Wonderful to see them build a reputation through FOSS and codebase from free labor to then cash in.

kergonath · 2025-09-25T21:13:41 1758834821

As long as the software that runs locally gets maintained (and ideally improved, though if it is not I’ll simply move to something else), I find it difficult to be angry. I am more annoyed by software companies that offer a nerfed "community edition" whose only purpose is to coerce people into buying the commercial version.

dcreater · 2025-09-25T21:20:07 1758835207

> software companies that offer a nerfed "community edition" whose only purpose is to coerce people into buying the commercial version.

This is the play. Its only a matter of time till they do it. Investors will want their returns

Imustaskforhelp · 2025-09-25T22:22:27 1758838947

pardon me but is Ollama a company though? I didn't knew that actually.

And are they VC funded? Are they funded by Y-combinator or anything else..

I just thought it was a project by someone to write something similar to docker but for LLM's and that was its pitch for a really really long time I think

dcreater · 2025-09-25T22:55:48 1758840948

Yup thats exactly what I thought as well. I also found out late and to much surprise that its a VC backed startup: https://www.ycombinator.com/companies/ollama

Imustaskforhelp · 2025-09-26T16:54:01 1758905641

Oh well. Enshittenification is close then I suppose :<

Gotta pay those VC juicy returns somehow.

all2 · 2025-09-25T20:50:25 1758833425

What sort of monetization model would you like to see? What model would you deem acceptable?

dcreater · 2025-09-25T21:23:34 1758835414

Ollama , the local inference platform, stays completely local. Maintained by a non-profit org with dev time contributed to by a for-profit company. That company can be VC backed and can make their cloud inference platform. And can use ollama as its backed, as a platform to market etc. But keep it as a separate product (not named ollama).

This is almost exactly how duckdb/motherduck functions and I think theyre doing an excellent job.

EDIT: grammar and readability

depingus · 2025-09-26T13:19:47 1758892787

You might want to check out RamaLama. It's a container based replacement for Ollama by the same folks that brought us Podman.

I tried it a while back, I was very surprised to find that simply running `uvx ramalama run deepseek-r1:1.5b` just worked. I'm on Fedora Silverblue with nothing layered on the ostree. Before RamaLama, getting llama.cpp working with my GPU was a major PITA.

https://github.com/containers/ramalama

troyvit · 2025-09-25T22:05:14 1758837914

If I were them I'd go whole-hog on local models and:

* Work with somebody like System76 or Framework to create great hardware systems come with their ecosystem preinstalled.

* Build out a PaaS, perhaps in partnership with an existing provider, that makes it easy for anybody to do what Ollama search does. I'm more than half certain I could convince our cash strapped organization to ditch elastic search for that.

* Partner with Home Assistant, get into home automation and wipe the floor with Echo and its ilk (yeah basically resurrect Mycroft but add whole-house automation to it).

Each of those are half-baked, but it also took me 7 minutes to come up with them, and they seem more in line with what Ollama tries to represent than a pure cloud play using low-power models.

Cheer2171 · 2025-09-25T23:30:28 1758843028

Have ollama server support auth / API keys (closed as out of scope) and monetize the way everyone else does around SSO.

Cheer2171 · 2025-09-25T23:24:11 1758842651

What reputation? People who actually know how to develop software or work with LLMs know ollama is a child's tricycle and to run the hell away from what is just a buggy shell around other people's inference engines.

Ollama is beloved by people who know how to write 5 lines of python and bash to do API calls, but can't possibly improve the actual app.

dcreater · 2025-09-26T00:33:07 1758846787

Thats what I thought so as well - that it was for people like me who arent professional SWEs and thus im sad to see them go this way. But what ive found is people are using it for "on-prem" style deployment, have no idea if this is common but I wouldnt be surprised given the reality of AI startups + the abundance of ollama in training dataset leading to relatively greater vibe coding success rate

Cheer2171 · 2025-09-26T10:18:28 1758881908

If people are using ollama for on prem deployment, for anything more than single user hobby use or backend for a UI proof of concept, then run as far away as you can from those people. Major red flag, immediately disqualifying.

hadlock · 2025-09-26T17:17:25 1758907045

What's your preferred method to do on prem deployment today?

ricardobeat · 2025-09-25T20:09:07 1758830947

For models you can't run locally like gpt-oss-120b, deepseek or qwen3-coder 480b. And a way for them to monetize the success of Ollama.

zmmmmm · 2025-09-25T22:28:07 1758839287

a lot of "local" models are still very large to download and slow to run on regular hardware. I think it's great to have a way to evaluate them cheaply in the cloud before deciding to pull down the model to run locally.

At some level it's also more of a principle that I could run something locally that matters rather than actually doing it. I don't want to become dependent on technology that someone could take away from me.