Best AI News — Updated Every 3 Hours
Story Page
← All Stories
Home Community Story
Community

We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers

Via r/LocalLlama
Monday, Mar 23, 2026 · 3:02PM
Summary

Projects are still submitting new scores on LoCoMo as of March 2026. but the benchmark is deeply flawed. We audited it and found 6.4% of the answer key is wrong, and the LLM judge accepts up to 63% intentionally wrong answers. LongMemEval-S fits entirely in modern context windows, making it more of

Continue reading the full article
Read at r/LocalLlama
www.reddit.com
Back to all stories