Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 54
Baseline Defenses for Adversarial Attacks Against Aligned Language Models Paper • 2309.00614 • Published Sep 1, 2023 • 2
Rethinking LLM Memorization through the Lens of Adversarial Compression Paper • 2404.15146 • Published Apr 23, 2024
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22, 2024 • 45
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 14