Personalized Safety Alignment for Text-to-Image Diffusion Models
Abstract
A personalized safety alignment framework integrates user-specific profiles into text-to-image diffusion models to better align generated content with individual safety preferences.
Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.
Community
π¨ Personalized AI Safety is here!
We introduce PSA β the first user-aware safety alignment for text-to-image generation.
π€ Todayβs AI models apply the same filters to everyone. But users differ β by age, beliefs, or mental health.
So we built a system that:
𧬠Learns your safety preferences from a profile (age, gender, religion, health...)
π Guides generation using cross-attention adapters
π Suppresses harmful content only when you find it unsafe
Result? AI thatβs safer for you, not just in general.
π Outperforms baselines on harmful content erasure and personalization.
π Paper: https://arxiv.org/abs/2508.01151
π» Code: https://github.com/M-E-AGI-Lab/PSAlign
π Project: https://m-e-agi-lab.github.io/PSAlign/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation (2025)
- AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models (2025)
- Steering Guidance for Personalized Text-to-Image Diffusion Models (2025)
- A Training-Free Style-Personalization via Scale-wise Autoregressive Model (2025)
- Consistent Story Generation with Asymmetry Zigzag Sampling (2025)
- LoRAShield: Data-Free Editing Alignment for Secure Personalized LoRA Sharing (2025)
- Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper