Community

WMB-100K – open source benchmark for AI memory systems at 100K turns

Via r/LocalLlama

Monday, Mar 23, 2026 · 9:07AM

Summary

Been thinking about how AI memory systems are only ever tested at tiny scales — LOCOMO does 600 turns, LongMemEval does around 1,000. But real usage doesn't look like that. WMB-100K tests 100,000 turns, with 3,134 questions across 5 difficulty levels. Also includes false memory probes — because "I d

Continue reading the full article

Read at r/LocalLlama

www.reddit.com