Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
grantpitt
5 days ago
|
parent
|
context
|
favorite
| on:
Claude Opus 4.5
do say more
GodelNumbering
5 days ago
[–]
Makes it sound like a one trick pony
reply
jascha_eng
5 days ago
|
parent
|
next
[–]
Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.
reply
Mkengin
4 days ago
|
root
|
parent
|
next
[–]
I am eagerly awaiting swe-rebench results for November with all the new models:
https://swe-rebench.com/
reply
grantpitt
5 days ago
|
parent
|
prev
[–]
well, it's a big trick
reply
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: