Is there any evidence this works better than Claude 3.5?

projectileboy · 2024-12-10T18:20:33 1733854833

I work with a team at Nubank that has been using Devin. I would say that it doesn't quite make sense to compare it to Claude 3.5, because Devin isn't really like Copilot; it's more like an assistant to which you can assign a project. We're using it only for particular use cases, but for those particular use cases it's like having a superpower.

amkkma · 2024-12-10T22:02:07 1733868127

Based on this, what is the outlook for software dev generally, and junior and mid level devs?

throw83288 · 2024-12-11T01:24:45 1733880285

More specifically: What kind of advice does GP have for Computer Science students in school right now?

I've been frankly terrified of the pace of LLM development since 2022.

servercobra · 2024-12-10T19:54:27 1733860467

Do you have any examples of the kinds of projects you would assign it to?

Yusefmosiah · 2024-12-10T18:25:15 1733855115

The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.

jonny_eh · 2024-12-10T20:44:15 1733863455

> We're using it only for particular use cases

Can you share concrete examples?

projectileboy · 2024-12-10T21:06:45 1733864805

I can’t really be too specific. But I can say that at least one pattern of problem it tackles very effectively is: “we’re migrating from X to Y, and it’s going to touch a ton of files, and the nature of that migration is much more involved than what we can reasonably hope to accomplish with sed and a bash script.”

anticensor · 2024-12-10T21:15:11 1733865311

I tasked Devin with writing a project proposal (in a topic I am not going to disclose here) with multiple documents including feasibility analysis, grant applications, legal analysis and post-implementation training materials and it was almost perfect at it.

jonny_eh · 2024-12-10T21:44:51 1733867091

Amazing claims, if only it could be publicly shared and scrutinized.

mike_yu · 2024-12-10T17:57:35 1733853455

i use this every day and a lot of the magic is in the workflow and agent layer -- claude 3.5 can generate a snippet of code for you but it isn't going to open a browser, read api docs, actually make calls to the api, debug, run the code and make sure it builds and works, etc

bfeynman · 2024-12-10T18:28:18 1733855298

Anthropic and OpenAI have certainly been working on this behind the scenes, while they try to see how much better they can get models, they will let others pay for the current state until they find it valuable. The shift we are seeing now is already happening, and they are taking an even larger macroscopic approach by creating computer/tool use, along with the context protocol, so that when it's released it will work with almost any IDE and system...

xpasky · 2024-12-10T18:34:02 1733855642

Why wouldn't it? Just give it a shell tool. (Something like claude.vim, perhaps.)

kordlessagain · 2024-12-10T21:10:25 1733865025

Like this: https://github.com/Mittaai/webwright

kgilpin · 2024-12-10T19:18:09 1733858289

People are saying it’s apples and oranges, but with Computer Use taken into account, this seems like a fair question.

https://docs.anthropic.com/en/docs/build-with-claude/compute...

daft_pink · 2024-12-10T19:54:47 1733860487

I wish they offered a computer use reference implimentation on Windows instead of a linux docker container.