More

avetiszakharyan · 2025-05-01T20:55:53 1746132953

you can combine agents in 2 ways, you can constantly swap agents during one conversation, or you can have the in separate conversations and collaborate. I was even thinking 2 agents can work on 2 separate git clones, and then do PR's to each other. I also like using code to do the image parsing and css, adn then gemini to do the coding. I tried using gemma and qwen for "real stuff" but its more of a, simple stuff only, if i really need output, id' rather spend money for now. Hopefully to change soon. as for rotuing, localforge does NOT know. you choose the agent, and it will loop inside that agent forever. Like, the way it works is that unless agent decides to talk to a user, it will forever be doing function calls and "talking to functions", as one agent. The only routing happens this way. there is main model and there is Expert model. main model knows to ask expert model (see system promp), when its stuck. so for any rouing to happen 1) system prompt needs to mention it 2) a routing to another model should be a function call. that way model knows how to ask another model for something

nico · 2025-05-01T21:16:19 1746134179

Great insights, thank you for the extended and detailed answer, I'll have to try it out

avetiszakharyan · 2025-05-01T19:19:51 1746127191

Quick Video i made on this topic: https://www.youtube.com/watch?v=-h_IZhOdAeU

avetiszakharyan · 2025-05-01T16:19:47 1746116387

Honestly, it is quite a hastle, took me 2 hours BUT. if you just take the whole article text and paste that to gemini-2.5-pro and give your circumstance, i think it will give you specific steps for your case and it should be trivial from that moment on

avetiszakharyan · 2025-05-01T16:18:04 1746116284

I went form bottom up, started with 4B, then 8B, then 30B, and 30B was the only one that started to "use tools". Other models were saying they will use but never did, or didnt notice all the tools. I think anything above 30b would atually be able to go full GPT on a task. 30b does it, but a bit.. meh

avetiszakharyan · 2025-05-01T16:16:05 1746116165

Is there an additional system prompt before that? Or i can repro with just this?

rcarmo · 2025-05-01T19:33:38 1746128018

Just that. I purposefully used exactly the same thing I did with Claude and Gemini to see how the models dealt with ambiguity.

avetiszakharyan · 2025-05-01T16:04:53 1746115493

I think we are just tiny bit away of being able to really "code" with ai, locally. Because even if it would be on gemini2.5 level, since its free, you can make it self prompt a bit more and eventually solve any problem. if i could ran 200b or if 30b wouldve been as good - it wouldve been enough

avetiszakharyan · 2025-05-01T15:47:08 1746114428

Definitely try it, it can navigate files search for stuff, run bash commands, and while 30b is a bit cranky it gets the job done (much worse then i would get when i plug in gpt-4.1, but its still not bad, Kudos o qwen. As for localforge, it really is a vibe coding tool, just like claude or codex, but with the possibility to plug more than just one provider. What's wrong with that?

tough · 2025-05-01T18:18:57 1746123537

They're just pointing out how most -educational- content is actually marketing in disguise, which is fine, but also fine to acknowledge i guess, even if a bit snarkily

avetiszakharyan · 2025-05-01T20:49:38 1746132578

Well its an oss project, free, I kind of didnt see it that way i guess, that something thats given for free is bad-tone to market in any possible way. Iguess from my standpoint its more of a, I just want to show this thing to people, because I am proud of it as a personal project, and it brings me joy to just, put it out there And since if you "just put it out there" it will sink to the bottom of the HN pit, why not get a bit more creative.

tough · 2025-05-02T00:32:30 1746145950

I respect your approach and appreciate the content.

avetiszakharyan · 2025-05-01T14:54:14 1746111254

For me it was peaking at 35GB even when using

avetiszakharyan · 2025-05-01T14:51:46 1746111106

Yes, i did that but its not apple silicon optimized so it was taking forever for 30b models. So its ok, but its not fantastic

spmurrayzzz · 2025-05-01T18:59:17 1746125957

You can just use llama.cpp instead (which is what ollama is using under the hood via bindings). Just need to make sure youre using commit `d3bd719` or newer. I normally use this with nvidia/cuda, but tested on my mbp and havent had any speed issues thus far.

Alternatively, LMStudio has MLX support you can use as well.

avetiszakharyan · 2025-05-01T14:51:12 1746111072

I was just trying to make sure is maximally performant, and did it with MLX because i am running on mac hardware and wanted to be able to run 30b in reasonable time so it can actually autonomously code something. Otherwise there are many ways to do it!

p0w3n3d · 2025-05-01T14:56:13 1746111373

It would be nice if you mentioned it's about apple silicon, and not apple intel computers. They're still ubiquitous nowadays

Tokumei-no-hito · 2025-05-01T16:37:58 1746117478

we're on the 4th generation of silicon now