GRATH: Gradual Self-Truthifying for Large Language Models Paper • 2401.12292 • Published Jan 22, 2024 • 2
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Paper • 2306.11698 • Published Jun 20, 2023 • 12
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets Paper • 2303.05762 • Published Mar 10, 2023