Submissions from apolloresearch.ai

		Stress Testing Deliberative Alignment for Anti-Scheming Training (apolloresearch.ai)
		1 point by arunc 15 days ago \| past
		More capable models are better at in-context scheming (apolloresearch.ai)
		3 points by JumpCrisscross 3 months ago \| past
		More capable models are better at in-context scheming (apolloresearch.ai)
		6 points by miles 3 months ago \| past \| 1 comment
		Scheming Reasoning Evaluations (apolloresearch.ai)
		2 points by matthberg 4 months ago \| past
		Towards Safety Cases for AI Scheming (apolloresearch.ai)
		2 points by doener 10 months ago \| past
		Scheming Reasoning Evaluations (apolloresearch.ai)
		2 points by cglong 10 months ago \| past \| 1 comment
		An evaluation of frontier AI models: OpenAI's o1 was capable of scheming (apolloresearch.ai)
		1 point by seraphsf 10 months ago \| past \| 1 comment
		Scheming reasoning evaluations – o1 results (apolloresearch.ai)
		4 points by amrrs 10 months ago \| past \| 1 comment
		The Evals Gap (apolloresearch.ai)
		4 points by sundarurfriend 11 months ago \| past
		Research on strategic deception presented at the UK's AI Safety Summit (apolloresearch.ai)
		2 points by ek750 on Nov 4, 2023 \| past