More

miletus · 2025-11-27T06:30:16 1764225016

hey all,

i’m trying to explore more open-source models and wanted to hear from the community.

which model has become your go-to, and for what use case?

miletus · 2025-11-07T09:25:36 1762507536

From our tests, Kimi K2 Thinking is better than literally everything - gpt-5, claude 4.5 sonnet. the only model that is better than Kimi K2 thinking is GPT-5 codex.

It's now available on https://okara.ai if anyone wants to try it.

miletus · 2025-09-30T11:34:21 1759232061

we at agentsea.com have been playing with it for a while.

here's what we think about it:

- still sucks at generating pretty ui - great for creative writing and long-form planning - it’s really fast but not smarter than gpt-5 - pairs well with external tools/agents for research and automation - comes with a 1m token context window, so you can feed it monstrous codebases or giant docs - still hallucinates or stumbles on complex requests

miletus · 2025-09-14T03:24:04 1757820244

wow, i just discovered this page. didn't know there was a page that lists all the best comments on Hacker News

gus_massa · 2025-09-15T01:35:37 1757900137

Take a look at https://news.ycombinator.com/lists and https://github.com/minimaxir/hacker-news-undocumented

eth0up · 2025-09-14T06:19:53 1757830793

And what other wonders might hide in plain sight! After a decade, this is new to me, but surprisingly uninteresting so far. I do think dang did a great job prefacing for the shitstorm that would otherwise ensue for that discussion.

miletus · 2025-09-02T18:00:57 1756836057

This is exactly why a lot of people are running local LLMs or moving toward privacy-first platforms

we recently shipped secure mode on https://www.agentsea.com.

With Secure Mode, all chats run either on open-source models or models hosted on our own servers - so you can chat with AI without worrrying about privacy.

kbelder · 2025-09-02T18:13:26 1756836806

There is a very strong use case for less-powerful but local LLMs, and there's going to be a big expansion in that area in the next couple years. So big, I'll bet that all the major AI players will do everything they can to cripple them.

brazukadev · 2025-09-02T18:53:10 1756839190

Not Nvidia, that's their best scenario

conradkay · 2025-09-03T00:16:25 1756858585

Unfortunately I think you're overestimating how many people care enough about privacy to go through the effort of running LLMs locally and likely buying a GPU

lawlessone · 2025-09-02T18:18:29 1756837109

tbh i just run local because i can . No real reason too.

edit: there are some instances where i would like to be able to set the same seed repeatedly which isn't always possible online.

datameta · 2025-09-02T18:47:34 1756838854

Realistically, how useful is local LLM usage? What are your use cases, hardware, and models used?

neutrinobro · 2025-09-02T19:26:22 1756841182

I have a old system with 3 ancient Tesla K40s which can easily run inference on ~30B parameter models (e.g. qwen3-coder:30b). I mostly use it as a compute box for other workloads, but its not completely incapable for some AI assisted coding. It is power hungry though, and the recent spike in local electricity rates is enough of an excuse to keep it off most of the time.

datameta · 2025-09-04T02:14:09 1756952049

I'm surprised the accelerators of yore trick actually worked and balancing a trio is trivially more difficult than duo? I enjoy the idea of having tons of VRAM and system RAM and loading a big model and getting responses a few times per hour as long as its high quality

neutrinobro · 2025-09-08T13:47:33 1757339253

Yeah, I was equally surprised. I am using a patched version of ollama to run the models: https://github.com/austinksmith/ollama37 which has a trivial change to allow it to run with old versions of cuda (3.5, 3.7). Obviously this was before tensor cores were a thing, so you're not going to be blown away by the performance, but it was cheap. I got 3x k40s for $75 on ebay, they are passively cooled, so they do need to be in a server chassis.

lawlessone · 2025-09-02T19:19:48 1756840788

>Realistically, how useful is local LLM usage?

For me, none really, just as a toy. I don't get much use out of online either. There was Kaggle competition to find issues with OpenAI's open weights model, but because my RTX gpu didn't have enough memory i had to run it very slowly from with CPU/ram.

Maybe other people have actual uses, but i don't

npteljes · 2025-09-03T07:38:06 1756885086

The differentiator is that locally, you can use abliterated models - models where they undid the guardrails.

hnuser123456 · 2025-09-02T18:59:15 1756839555

Lots of people already have RTX 3090/4090/5090 for gaming and they can run 30b-class models at 40+ tok/sec. There is a huge field of models and finetunes of this size on huggingface. They are a little bit dumber than the big cloud models but not by much. And being able to run them 24/7 for just the price of electricity (and the privacy) is a big pull.

nomel · 2025-09-02T20:13:25 1756844005

> they can run 30b-class models at 40+ tok/sec.

No, they can run quantized versions of those models, which are dumber than the base 30b models, which are much dumber than > 400b models (from my use).

> They are a little bit dumber than the big cloud models but not by much.

If this were true, we wouldn't see people paying the premiums for the bigger models (like Claude).

For every use case I've thrown at them, it's not a question of "a little dumber", it's the binary fact that the smaller models are incapable of doing what I need with any sort of consistency, and hallucinate at extreme rates.

What's the actual use case for these local models?

hnuser123456 · 2025-09-02T20:29:08 1756844948

With quantization-aware-training techniques, q4 models are less than 1% off from bf16 models. And yes, if your use case hinges on the very latest and largest cloud-scale models, there are things they can do the local ones just can't. But having them spitting tokens 24/7 for you would have you paying off a whole enterprise-scale GPU in a few months, too.

If anyone has a gaming GPU with gobs of VRAM, I highly encourage they experiment with creating long-running local-LLM apps. We need more independent tinkering in this space.

nomel · 2025-09-02T20:53:55 1756846435

> But having them spitting tokens 24/7 for you would have you paying off a whole enterprise-scale GPU in a few months, too.

Again, what's the use case? What would make sense to run, at high rates, where output quality isn't much of a concern? I'm genuinely interested in this question, because answering it always seems to be avoided.

hnuser123456 · 2025-09-02T22:23:07 1756851787

Any sort of business that might want to serve from a customized LLM at scale and doesn't need the smartest model possible, or hobbyist/researcher experiments. If you can get an agentic framework to work on a problem with a local model, it'll almost certainly work just as well on a cloud model. Again, speaking mostly people to already have a xx90 class GPU sitting around. Smoke 'em if you've got 'em. If you don't have a 3090/4090/5090 already, and don't care about privacy, then just enjoy how the improvements in local models are driving down the price per token of non-bleeding-edge cloud models.

nomel · 2025-09-05T21:33:54 1757108034

> If you can get an agentic framework to work on a problem with a local model, it'll almost certainly work just as well on a cloud model.

This is the exact opposite from my tests: it will almost certainly NOT work as well as the cloud models, as supported by every benchmark I've ever seen. I feel like I'm living in another AI universe here. I suppose it heavily depends on the use case.

datameta · 2025-09-04T01:57:36 1756951056

What kind of interactions do you have? Brainstorming, knowledge framework, rubber duck debug plus? Help me understand please if you will because I have a 3090 sitting without a suitable rest of it all and I wonder invest or not?

jonathanlb · 2025-09-02T18:13:12 1756836792

Given that this is in response to a ChatGPT user who killed his mother and then himself, I'm not sure that positioning your product as being more secure than ChatGPT is wise, because your marketing here suggests either:

1. Profound tone-deafness about appropriate contexts for privacy messaging

2. Intentional targeting of users who want to avoid safety interventions

3. A fundamental misunderstanding of your ethical obligations as an AI provider

None of these interpretations reflect well on AgentSea's judgment or values.

kbelder · 2025-09-02T18:15:30 1756836930

I disagree. The fact that the crimes done by a mentally ill person are going to be used as a justification for surveillance on the wider population of users is a strong ethical reason to advocate for more security.

VonGuard · 2025-09-02T18:51:38 1756839098

Yeah, it'd be terrible if all our emails, DNS queries, purchase histories, messages, Facebook posts, Google searches, in store purchase, driving and GPS info were being tracked, cataloged, and sold to anyone who wants it! Why, people would never stand for such surveillance!

Anyone with half a brain complaining about hypothetical future privacy violations on some random platform just makes me spit milk out my nose. What privacy?! Privacy no longer exists, and worrying that your chat logs are gonna get sent to the authorities seems to me like worrying that the cops are gonna give you a parking ticket after your car blew up because you let the mechanic put a bomb in the engine.

const_cast · 2025-09-02T20:01:13 1756843273

Things suck therefore it doesn't matter if things suck even more.

Just not a very good argument.

sleazebreeze · 2025-09-02T18:44:09 1756838649

Or maybe I just want to be able to talk to an LLM without worrying about if its going to report me to the authorities.

lurking_swe · 2025-09-02T19:32:44 1756841564

that’s a good point, privacy is important.

To play devils advocate for a second, what if someone that’s mentally ill uses a local LLM for therapy and doesn’t get the help they need? Even if it’s against their will? And they commit suicide or kill someone because the LLM said it’s the right thing to do…

Is being dead better, or is having complete privacy better? Or does it depend?

I use local LLMs too, but it’s disingenuous to act like they solve the _real_ problem here. Mentally ill people trying to use an LLM for therapy. It can end catastrophically.

sleazebreeze · 2025-09-03T02:28:01 1756866481

I don't want to deal with prompt injection attacks leading to being swatted. That's where all this reporting to the authorities is leading and it's not looking fun.

> Is being dead better, or is having complete privacy better? Or does it depend?

I know you're being provocative, but this feels like a false dichotomy. Mental health professionals are pro-privacy AND have mandatory reporting laws based on their best judgement. Do we trust LLMs to report a suicidal person that has been driven there by the LLM itself?

LLMs can't truly be controlled and can't be designed to not encourage mentally ill people to kill themselves.

> Mentally ill people trying to use an LLM for therapy

Yes indeed this is one of the core problems. I have experimented with this myself and the results were highly discouraging. Others that don't have the same level of discernment for LLM usage may mistake the confidence of the output for a well-trained therapist.

LamerBeeI · 2025-09-02T19:36:02 1756841762

I too think there should be no rules or attempts to derisk any situation, just let us die

exe34 · 2025-09-02T18:31:47 1756837907

Are you in America? Do you also support banning guns?

miletus · 2025-09-02T11:32:45 1756812765

here's a summary for those interested: https://x.com/tryagentsea/status/1962818101017485338

miletus · 2025-09-02T11:23:39 1756812219

updated. thanks a lot man

miletus · 2025-04-13T06:12:53 1744524773

saw this on x. loved it. made by @thekitze on x

miletus · 2025-03-28T07:22:53 1743146573

i'd appreciate any links

miletus · 2025-03-26T11:34:51 1742988891

saw this thread on X. here are some incredible use cases of 4o image generation: https://x.com/0xmetaschool/status/1904804251148443873