X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published 11 days ago • 29
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published 25 days ago • 15
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published 25 days ago • 15
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence Paper • 2503.05037 • Published Mar 6 • 4
STIV: Scalable Text and Image Conditioned Video Generation Paper • 2412.07730 • Published Dec 10, 2024 • 74
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 11
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 11
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models Paper • 2410.05269 • Published Oct 7, 2024 • 3
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings Paper • 1904.10635 • Published Apr 24, 2019
The Woman Worked as a Babysitter: On Biases in Language Generation Paper • 1909.01326 • Published Sep 3, 2019
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems Paper • 2310.05280 • Published Oct 8, 2023 • 1
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems Paper • 2305.07797 • Published May 12, 2023
Mitigating Bias for Question Answering Models by Tracking Bias Influence Paper • 2310.08795 • Published Oct 13, 2023