Post
1676
Want to know which AI models are least likely to hallucinate — and how to keep yours from spiking hallucinations by 20%?
A new benchmark called Phare, by Giskard, tested leading models across multiple languages, revealing three key findings:
1️⃣ Popular models aren't necessarily factual. Some models ranking highest in user satisfaction benchmarks like LMArena are actually more prone to hallucination.
2️⃣ The way you ask matters - a lot. When users present claims confidently ("My teacher said..."), models are 15% less likely to correct misinformation vs. neutral framing ("I heard...").
3️⃣ Telling models to "be concise" can increase hallucination by up to 20%.
What's also cool is that the full dataset is public - use them to test your own models or dive deeper into the results! H/t @davidberenstein1957 for the link.
- Study: https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms
- Leaderboard: https://phare.giskard.ai/
- Dataset: giskardai/phare
A new benchmark called Phare, by Giskard, tested leading models across multiple languages, revealing three key findings:
1️⃣ Popular models aren't necessarily factual. Some models ranking highest in user satisfaction benchmarks like LMArena are actually more prone to hallucination.
2️⃣ The way you ask matters - a lot. When users present claims confidently ("My teacher said..."), models are 15% less likely to correct misinformation vs. neutral framing ("I heard...").
3️⃣ Telling models to "be concise" can increase hallucination by up to 20%.
What's also cool is that the full dataset is public - use them to test your own models or dive deeper into the results! H/t @davidberenstein1957 for the link.
- Study: https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms
- Leaderboard: https://phare.giskard.ai/
- Dataset: giskardai/phare