Towards Codable Watermarking for Injecting Multi-bits Information to LLMs Paper • 2307.15992 • Published Jul 29, 2023 • 1
Well-classified Examples are Underestimated in Classification with Deep Neural Networks Paper • 2110.06537 • Published Oct 13, 2021
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Paper • 2502.18080 • Published Feb 25 • 2
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization Paper • 2406.11431 • Published Jun 17, 2024 • 4
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents Paper • 2402.11208 • Published Feb 17, 2024
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models Paper • 2110.07831 • Published Oct 15, 2021
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models Paper • 2103.15543 • Published Mar 29, 2021