Article 2 Introducing the Polish ASR Leaderboard (PAL) and Benchmark Intended Grouping of Open Speech (BIGOS) Corpora
AI Evals BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published 6 days ago • 15
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published 6 days ago • 15
Agents Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16 WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published Aug 11 • 109 TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published 11 days ago • 27
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published 11 days ago • 27
AI Evals BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published 6 days ago • 15
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published 6 days ago • 15
Agents Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16 WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published Aug 11 • 109 TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published 11 days ago • 27
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published 11 days ago • 27