LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content Paper • 2407.10995 • Published Jun 24, 2024 • 1
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper • 2411.12946 • Published Nov 20, 2024 • 23
Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study Paper • 2502.12485 • Published Feb 18 • 2
MinorBench: A hand-built benchmark for content-based risks for children Paper • 2503.10242 • Published Mar 13 • 5
Know Or Not: a library for evaluating out-of-knowledge base robustness Paper • 2505.13545 • Published May 19
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Paper • 2507.05980 • Published 21 days ago • 1
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Paper • 2507.09820 • Published 16 days ago
Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation Paper • 2507.11966 • Published 13 days ago
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators Paper • 2507.15339 • Published 8 days ago
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security Paper • 2507.19399 • Published 4 days ago
Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios Paper • 2507.18368 • Published 5 days ago