Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Stress Testing Deliberative Alignment for Anti-Scheming Training (apolloresearch.ai)
1 point by arunc 15 days ago | past
More capable models are better at in-context scheming (apolloresearch.ai)
3 points by JumpCrisscross 3 months ago | past
More capable models are better at in-context scheming (apolloresearch.ai)
6 points by miles 3 months ago | past | 1 comment
Scheming Reasoning Evaluations (apolloresearch.ai)
2 points by matthberg 4 months ago | past
Towards Safety Cases for AI Scheming (apolloresearch.ai)
2 points by doener 10 months ago | past
Scheming Reasoning Evaluations (apolloresearch.ai)
2 points by cglong 10 months ago | past | 1 comment
An evaluation of frontier AI models: OpenAI's o1 was capable of scheming (apolloresearch.ai)
1 point by seraphsf 10 months ago | past | 1 comment
Scheming reasoning evaluations – o1 results (apolloresearch.ai)
4 points by amrrs 10 months ago | past | 1 comment
The Evals Gap (apolloresearch.ai)
4 points by sundarurfriend 11 months ago | past
Research on strategic deception presented at the UK's AI Safety Summit (apolloresearch.ai)
2 points by ek750 on Nov 4, 2023 | past

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: