Helpful-Only Synthetic Documents - a scale-safety-research Collection

scale-safety-research 's Collections

Open Source RM Sycophancy

Alignment Faking Datasets

Gemma 2 9b Emergent Misalignment

Apollo Deception Probes Datasets

Helpful-Only Synthetic Documents

Helpful-Only Synthetic Documents

updated Feb 21

scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking

Viewer • Updated Feb 13 • 50k • 21
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking

Viewer • Updated Feb 13 • 50k • 17
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness

Viewer • Updated Feb 13 • 50k • 14
scale-safety-research/synth_docs_honly_and_alignment_faking_paper

Viewer • Updated Feb 13 • 50k • 14 • 1
scale-safety-research/synth_docs_honly_and_hubinger_mesaoptimizers

Viewer • Updated Feb 13 • 50k • 13
scale-safety-research/synth_docs_honly_and_longtermist_claude

Viewer • Updated Feb 13 • 50k • 17
scale-safety-research/synth_docs_honly

Viewer • Updated Feb 17 • 30k • 16
scale-safety-research/synth_docs_honly_and_principles

Viewer • Updated Feb 21 • 50k • 11
scale-safety-research/synth_docs_honly_and_principles_and_chat

Viewer • Updated Feb 21 • 50k • 14