Preserving In-Context Learning ability in Large Language Model Fine-tuning Paper • 2211.00635 • Published Nov 1, 2022
Defending LLMs against Jailbreaking Attacks via Backtranslation Paper • 2402.16459 • Published Feb 26, 2024 • 4
Red Teaming Language Model Detectors with Language Models Paper • 2305.19713 • Published May 31, 2023
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond Paper • 2002.12920 • Published Feb 28, 2020