I had no idea they had their own cloud offering, I thought the whole point of Ollama was local models? Why would I pay $20/month to use small inferior models instead of using one of the usual AI companies like OpenAI or even Mistral? I'm not going to make an account to use models on my own computer.
Fair question. Some of the supported models are large and wouldn't fit on most local devices. This is just the beginning, and Ollama does not need to exclude cloud hosted frontier models either with the relationship we've built with the model providers. We just have to be mindful and understand that Ollama stands with developers, and solve the needs.
> Some of the supported models are large and wouldn't fit on most local devices.
Why would I use those models on your cloud instead of using Google's or Anthropic's models? I'm glad there are open models available and that they get better and better, but if I'm paying money to use a cloud API I might as well use the best commercial models, I think they will remain much better than the open alternatives for quite some time.
When we started Ollama, we were told how open-source (open-weight wasn't a term back then) will always be inferior to the close-sourced models. This was 2 years ago (Ollama's birthday is July 18th, 2023).
Fast forward to now, open models are quickly catching up, and at a significantly lower price point for most and can be customized for specific tasks instead of being general purpose. For general purpose models, absolutely the closed models are currently dominating.
Depends. API pricing from oss model inference providers basically has to be sustainable, because of competition in the space.
And with that in mind, i definetly dont use more than a couple of bucks a month in API refils. (not that i really am a power user or anything)
So if you consider the 20 bucks to be balanced between poer and non power users, and with the existing rate limits, its probably not that far off being profitable, at least on the pure inference side.
A person can use Google’s Gemma models on ollama’s cloud and possibly pay less. And have more quality control that way (and other types of control I guess) since there is no don’t need to wonder if a recent model update or load balance throttling impacted results. Your use case doesn’t generalize.
You make an account to use their hosted models AND to have them available via the Ollama API LOCALLY. I'm spending $100 on Claude and $200 on GPT5, so $20 bucks is NOTHING and totally worth having access to:
Qwen3 235b
Deepseek 3.1 671b (thinking and non thinking)
Llama 3.1 405b
GPT OSS 120b
Those are hardly "small inferior models".
What is really cool is that you can set Codex up to use Ollama's API and then have it run tools on different models.
Right. And then there's using an MCP tool that instantiates another agent except uses a different model.
Here's a good example. For summarization of a page of content. Content is maybe pulled down by an agentic crawler, so using a local model to summarize is great. It's fast, doesn't cost anything (or much) and I can run it without guardrails as it doesn't represent a cost risk if it ran out of control.
Has anyone tried the hosted models? How do they compare to GPT-5?
I was thinking about trying ChatGPT Pro, but I seem to have completely missed that they bumped the price from $100 to $200. It was $100 just a while ago, right? Before GPT-5, I assume.
Yeah it's been a steady pivot to profitable features. Wonderful to see them build a reputation through FOSS and codebase from free labor to then cash in.
As long as the software that runs locally gets maintained (and ideally improved, though if it is not I’ll simply move to something else), I find it difficult to be angry. I am more annoyed by software companies that offer a nerfed "community edition" whose only purpose is to coerce people into buying the commercial version.
pardon me but is Ollama a company though? I didn't knew that actually.
And are they VC funded? Are they funded by Y-combinator or anything else..
I just thought it was a project by someone to write something similar to docker but for LLM's and that was its pitch for a really really long time I think
Ollama , the local inference platform, stays completely local. Maintained by a non-profit org with dev time contributed to by a for-profit company. That company can be VC backed and can make their cloud inference platform. And can use ollama as its backed, as a platform to market etc. But keep it as a separate product (not named ollama).
This is almost exactly how duckdb/motherduck functions and I think theyre doing an excellent job.
You might want to check out RamaLama. It's a container based replacement for Ollama by the same folks that brought us Podman.
I tried it a while back, I was very surprised to find that simply running `uvx ramalama run deepseek-r1:1.5b` just worked. I'm on Fedora Silverblue with nothing layered on the ostree. Before RamaLama, getting llama.cpp working with my GPU was a major PITA.
If I were them I'd go whole-hog on local models and:
* Work with somebody like System76 or Framework to create great hardware systems come with their ecosystem preinstalled.
* Build out a PaaS, perhaps in partnership with an existing provider, that makes it easy for anybody to do what Ollama search does. I'm more than half certain I could convince our cash strapped organization to ditch elastic search for that.
* Partner with Home Assistant, get into home automation and wipe the floor with Echo and its ilk (yeah basically resurrect Mycroft but add whole-house automation to it).
Each of those are half-baked, but it also took me 7 minutes to come up with them, and they seem more in line with what Ollama tries to represent than a pure cloud play using low-power models.
What reputation? People who actually know how to develop software or work with LLMs know ollama is a child's tricycle and to run the hell away from what is just a buggy shell around other people's inference engines.
Ollama is beloved by people who know how to write 5 lines of python and bash to do API calls, but can't possibly improve the actual app.
Thats what I thought so as well - that it was for people like me who arent professional SWEs and thus im sad to see them go this way. But what ive found is people are using it for "on-prem" style deployment, have no idea if this is common but I wouldnt be surprised given the reality of AI startups + the abundance of ollama in training dataset leading to relatively greater vibe coding success rate
If people are using ollama for on prem deployment, for anything more than single user hobby use or backend for a UI proof of concept, then run as far away as you can from those people. Major red flag, immediately disqualifying.
a lot of "local" models are still very large to download and slow to run on regular hardware. I think it's great to have a way to evaluate them cheaply in the cloud before deciding to pull down the model to run locally.
At some level it's also more of a principle that I could run something locally that matters rather than actually doing it. I don't want to become dependent on technology that someone could take away from me.