Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10, 2024 • 29
Towards Measuring the Representation of Subjective Global Opinions in Language Models Paper • 2306.16388 • Published Jun 28, 2023 • 6
Opportunities and Risks of LLMs for Scalable Deliberation with Polis Paper • 2306.11932 • Published Jun 20, 2023 • 6