Apollo Deception Probes Datasets - a scale-safety-research Collection

scale-safety-research 's Collections

Open Source RM Sycophancy

Alignment Faking Datasets

Gemma 2 9b Emergent Misalignment

Apollo Deception Probes Datasets

Helpful-Only Synthetic Documents

Apollo Deception Probes Datasets

updated Mar 18