Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Meg Tong
meg-tong
Follow
https://www.megtong.com
meg-tong
AI & ML interests
None yet
Recent Activity
authored
a paper
4 days ago
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
authored
a paper
about 1 year ago
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
authored
a paper
about 1 year ago
Steering Llama 2 via Contrastive Activation Addition
View all activity
Organizations
None yet
Papers
4
arxiv:
2501.18837
arxiv:
2401.05566
arxiv:
2312.06681
arxiv:
2310.13548
models
None public yet
datasets
1
meg-tong/sycophancy-eval
Preview
•
Updated
Oct 23, 2023
•
53
•
3