safety - a zzfive Collection

zzfive 's Collections

RAG

ssm

safety

inference optimization

RL+reason model

Reinforcement learning

medical

3d

image

LLMs

video

agent

cv

audio

robot

safety

updated about 17 hours ago

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

Paper • 2502.05163 • Published Feb 7 • 22
CRANE: Reasoning with constrained LLM generation

Paper • 2502.09061 • Published Feb 13 • 19
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models

Paper • 2502.15799 • Published Feb 18 • 7
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Paper • 2502.16776 • Published Feb 24 • 6
LettuceDetect: A Hallucination Detection Framework for RAG Applications

Paper • 2502.17125 • Published Feb 24 • 11
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6 • 21
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Paper • 2504.01308 • Published Apr 2 • 13
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Paper • 2504.10430 • Published 25 days ago • 4
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits

Paper • 2504.03767 • Published Apr 2 • 4
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts

Paper • 2504.12782 • Published 22 days ago • 4
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Paper • 2504.13203 • Published 24 days ago • 31
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

Paper • 2504.15585 • Published 18 days ago • 13
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Paper • 2505.01456 • Published 9 days ago • 2
Teaching Models to Understand (but not Generate) High-risk Data

Paper • 2505.03052 • Published 4 days ago • 2