Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
OtherShrezzing
6 months ago
|
parent
|
context
|
favorite
| on:
Benchmarking LLM social skills with an elimination...
If you watch the top tier social deduction players on YouTube (things like Blood on the Clocktower etc), they’d figure out weaknesses in the LLM and exploit it immediately.
skybrian
6 months ago
[–]
Testing against people like that would be the way to do it. Otherwise it’s like testing a chess engine against casual players or worse.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: