With tools like Ollama, self-hosting is easier than hosted. No sign-up, no API k...

genewitch · 2025-06-07T00:57:47 1749257867

I got the 70b qwen llama distill, I have 24GB of vram.

I opened aider and gave a small prompt, roughly:

  Implement a JavaScript 2048 game that exists as flat file(s) and does not require a server, just the game HTML, CSS, and js. Make it compatible with firefox, at least.

That's it. Several hours later, it finished. The game ran. It was worth it because this was in the winter and it heated my house a bit, yay. I think the resulting 1-shot output is on my github.

I know it was in the training set, etc, but I wanted to see how big of a hassle it was, if it would 1-shot with such a small prompt, how long it would take.

Makes me want to try deepseek 671B, but I don't have any machines with >1TB of memory.

I do take donations of hardware.

mechagodzilla · 2025-06-07T11:40:34 1749296434

Buy a used workstation with 512GB of DDR4 RAM. It will probably cost like $1-1.5k, and be able to run a Q4 version of the full deepseek 671B models. I have a similar setup with dual-socket 18 core Xeons (and 768GB of RAM, so it cost about $2k), and can get about 1.5 tokens/sec on those models. Being able to see the full thinking trace on the R1 models is awesome compared to the OpenAI models.

3036e4 · 2025-06-07T13:35:24 1749303324

If/when Corporate Legal approves a tool like Ollama for use on company computers, yes. Might not require purchasing anything, but there can still be red tape.

mgraczyk · 2025-06-07T00:30:34 1749256234

It does not take dozens of hours to get an API key for gemini

cortesoft · 2025-06-07T04:33:48 1749270828

They weren’t saying getting the api key would take that long, just getting permission from their company to let them do it.

mkl · 2025-06-07T00:36:00 1749256560

I never claimed that it did. Gemini would probably save me the same dozens of hours, but come with ongoing costs and additional starting up hurdles (some near insurmountable in my organisation, like data security for some of what I'm doing).

shmoogy · 2025-06-07T01:19:02 1749259142

Gemini flash or any free LLM on openrouter would be orders of magnitude faster and effectively free. Unless you are concerned about privacy of the conversation - it's really purely being able to say you did it locally.

I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.