Submissions from variance.co

		Alignment is not free: How model upgrades can silence your confidence signals (variance.co)
		121 points by karinemellata 6 months ago \| past \| 67 comments
		We used sparse autoencoders to explain LLM moderation flags of violent threats (variance.co)
		6 points by karinemellata 6 months ago \| past