More

chw9e · 2025-12-04T01:40:04 1764812404

As someone who never used Phind, how was it better than copilot? Did it integrate data from broader sources or just do a better job of presenting it? Was it faster?

k__ · 2025-12-04T09:03:27 1764839007

It did so extensive online search.

chw9e · 2025-07-22T14:48:55 1753195735

Yea it can be, but you will pay Anthropic API price. If you already use Claude code on a subscription then you can reuse your subscription for this MCP, making it effectively free

Oras · 2025-07-22T14:56:23 1753196183

I meant why I can't use with OpenaI, DeepSeek, ... etc. Why you're limiting it to Claude?

chw9e · 2025-07-22T15:03:42 1753196622

Oh, I just picked Anthropic’s Sonnet4 to start because it’s really good at agentic stuff and because of the subscription benefits.

I will add more model support soon - any models you particularly want to see?

I’m also hoping to test out RL on the tools to get a fine-tuned model specifically for browser automation eventually.

Oras · 2025-07-22T16:18:13 1753201093

OpenAI models and DeepSeek, that would be great

chw9e · 2025-07-22T14:51:03 1753195863

It also has its own cli if you just want to run it alone too

chw9e · 2025-05-20T23:54:12 1747785252

this is t3 chat from what i understand, but probably many people already doing this. this is a good approach for wrappers.

chw9e · 2025-05-20T23:52:50 1747785170

probably the idea behind the coding tools eventually. cursor charges a 20% margin on every token for their max models but people still use them

chw9e · 2025-05-20T17:53:38 1747763618

Think we've got a long time yet for that. We're going to be writing code a lot faster but getting these things to 90-95% on such a wide variety of tasks is going to be a monumental effort, the first 60-70% on anything is always much easier than the last 5-10%.

Also there's a matter of taste, as commented above, the best way to use these is going to be running multiple runs at once (that's going to be super expensive right now so we'll need inference improvements on today's SOTA models to make this something we can reasonably do on every task). Then somebody needs to pick which run made the best code, and even then you're going to want code review probably from a human if it's written by machine.

Trusting the machine and just vibe coding stuff is fine for small projects or maybe even smaller features, but for a codebase that's going to be around for a while I expect we're going to want a lot of human involvement in the architecture. AI can help us explore different paths faster, but humans need to be driving it still for quite some time - whether that's by encoding their taste into other models or by manually reviewing stuff, either way it's going to take maintenance work.

In the near-term, I expect engineering teams to start looking for how to leverage background agents more. New engineering flows need to be built around these and I am bearish on the current status quo of just outsource everything to the beefiest models and hope they can one-shot it. Reviewing a bunch of AI code is also terrible and we have to find a better way of doing that.

I expect since we're going to be stuck on figuring out background agents for a while that teams will start to get in the weeds and view these agents as critical infra that needs to be designed and maintained in-house. For most companies, foundation labs will just be an API call, not hosting the agents themselves. There's a lot that can be done with agents that hasn't been explored much at all yet, we're still super early here and that's going to be where a lot of new engineering infra work comes from in the next 3-5 years.

chw9e · 2025-05-15T20:02:27 1747339347

Yea, Airbnb keeps trying this services idea but not sure it's the right fit.

chw9e · 2025-05-08T20:59:10 1746737950

> I haven’t yet come across an agent that can write beautiful code.

o3 in codex is pretty close sometimes. I prefer to use it for planning/review but it far exceeds my expectations (and sometimes my own abilities) quite regularly.

chw9e · 2025-04-24T16:17:51 1745511471

The things I'm usually tabbing through in cursor are not the things that make me feel a lot of enjoyment in your work. The things that are most enjoyable are usually the system level design aspects, the refactorings to make things work better. These you can brainstorm with AI, but cannot delegate to AI today.

The rest is glorified boilerplate that I find usually saps me of my energy, not gives me energy. I'm a fan of anything that can help me skip over that and get to the more enjoyable work.

chw9e · 2025-04-22T21:36:15 1745357775

How good were LLMs at building code that uses LLMs? This used to be a pain in previous versions because the LLM didn't have any LLM-using code in its training data. And what was the rough breakdown of LLM usage in coding? (looking for something like 20% claude code, 30% gemini 2.5, the rest manual)

sourya4 · 2025-04-22T22:58:18 1745362698

Great questions! gemin-2.5-pro was pretty good in writing code that uses 3.7-sonnet! It drafted all the prompts and the pipeline.

You kind of called it. I had to tweak some minor SDK use details by sharing it the latest documentation docs in the second go.

I would say 30% Claude Code, 65% Cursor (gemini-2.5-pro), 5% manual.

chw9e · 2025-03-07T07:39:17 1741333157

This was a very cool paper about using diffusion language models and beam search: https://arxiv.org/html/2405.20519v1

Just looking at all of the amazing tools and workflows that people have made with ComfyUI and stuff makes me wonder what we could do with diffusion LMs. It seems diffusion models are much more easily hackable than LLMs.