Merging Improves Self-Critique Against Jailbreak Attacks Paper • 2406.07188 • Published Jun 11, 2024 • 4
Configurable Safety Tuning of Language Models with Synthetic Preference Data Paper • 2404.00495 • Published Mar 30, 2024 • 2
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs Paper • 2402.08005 • Published Feb 12, 2024 • 1
Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective Paper • 2312.01957 • Published Dec 4, 2023 • 1
Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation Paper • 2308.07929 • Published Jul 15, 2023 • 1
Personalizing Text-to-Image Generation via Aesthetic Gradients Paper • 2209.12330 • Published Sep 25, 2022 • 1