Community

[P] I built an open-source benchmark to test if LLMs are actually as confident as they claim to be (Spoiler: They often aren't)

Via r/MachineLearning

Saturday, Mar 21, 2026 · 3:45PM

Summary

Hey everyone, When building systems around modern open-source LLMs, one of the biggest issues is that they can confidently hallucinate or state an incorrect answer with a 95%+ probability. This makes it really hard to deploy them into the real world reliably if we don't understand their &q

Continue reading the full article

Read at r/MachineLearning

www.reddit.com