subliminal-learning Collection collection for [subliminal-learning](https://arxiv.org/abs/2507.14805) paper • 3 items • Updated 11 days ago • 2
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-5k-docs-0k-benign-0k-refusals Viewer • Updated Feb 3 • 5k • 6 • 1
Poser models Collection This collection includes models from Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals. https://arxiv.org/abs/2405.05466 • 36 items • Updated Aug 6, 2024 • 1
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 84