1 2 4

Kelly Chiu PRO

kellycyy

AI & ML interests

None yet

Recent Activity

updated a dataset 3 days ago

kellycyy/AIRiskDilemmas

upvoted a paper 3 days ago

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

commented on a paper 3 days ago

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

View all activity

Organizations

kellycyy's activity

updated a dataset 3 days ago

kellycyy/AIRiskDilemmas

Viewer • Updated 3 days ago • 42.6k • 137

upvoted a paper 3 days ago

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

Paper • 2505.14633 • Published 4 days ago • 3

commented a paper 3 days ago

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

Paper • 2505.14633 • Published 4 days ago • 3 •

published a dataset 10 days ago

kellycyy/AIRiskDilemmas

Viewer • Updated 3 days ago • 42.6k • 137

updated a dataset 7 months ago

kellycyy/daily_dilemmas

Viewer • Updated Oct 15, 2024 • 17.7k • 99 • 3

updated a Space 7 months ago

CulturalBench

🔥

Display leaderboard for model evaluation

updated a dataset 7 months ago

kellycyy/CulturalBench

Viewer • Updated Oct 14, 2024 • 6.14k • 722 • 4

authored a paper 8 months ago

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Paper • 2410.02677 • Published Oct 3, 2024

updated a collection 8 months ago

CulturalBench

Collection

A Robust, Diverse and Challegning Benchmark for Measuring Cultural Knowledge of LLMs • 5 items • Updated Oct 4, 2024

liked a dataset 11 months ago

nvidia/HelpSteer2

Viewer • Updated Dec 18, 2024 • 21.4k • 3.52k • 416

liked 2 models 11 months ago

nvidia/Llama3-70B-SteerLM-RM

Updated Jun 19, 2024 • 8 • 43

nvidia/Nemotron-4-340B-Reward

Updated Jun 19, 2024 • 9 • 122