Meg Tong's picture

1

Meg Tong

meg-tong

https://www.megtong.com

meg-tong

AI & ML interests

None yet

Recent Activity

authored a paper 4 days ago

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

authored a paper about 1 year ago

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

authored a paper about 1 year ago

Steering Llama 2 via Contrastive Activation Addition

View all activity

Organizations

None yet

Papers 4

arxiv:2501.18837

arxiv:2401.05566

arxiv:2312.06681

arxiv:2310.13548

models

None public yet

datasets 1

meg-tong/sycophancy-eval

Preview • Updated Oct 23, 2023 • 53 • 3