AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Paper โข 2410.09024 โข Published Oct 11, 2024 โข 1
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents Paper โข 2410.10871 โข Published Oct 8, 2024 โข 1