Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation Paper • 2310.18794 • Published Oct 28, 2023
PHAnToM: Personality Has An Effect on Theory-of-Mind Reasoning in Large Language Models Paper • 2403.02246 • Published Mar 4, 2024 • 1
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning Paper • 2505.08054 • Published May 12 • 1
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective Paper • 2506.19028 • Published 3 days ago • 1
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective Paper • 2506.19028 • Published 3 days ago • 1
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective Paper • 2506.19028 • Published 3 days ago • 1 • 1
OAgents: An Empirical Study of Building Effective Agents Paper • 2506.15741 • Published 9 days ago • 31
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices Paper • 2506.17538 • Published 6 days ago • 6
Steering Conceptual Bias via Transformer Latent-Subspace Activation Paper • 2506.18887 • Published 3 days ago • 6
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies Paper • 2506.17673 • Published 5 days ago • 6
SoK: Evaluating Jailbreak Guardrails for Large Language Models Paper • 2506.10597 • Published 14 days ago • 3
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions Paper • 2506.00643 • Published 26 days ago • 5 • 2
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions Paper • 2506.00643 • Published 26 days ago • 5
Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation Paper • 2406.03703 • Published Jun 6, 2024 • 2