Been thinking about how AI memory systems are only ever tested at tiny scales — LOCOMO does 600 turns, LongMemEval does around 1,000. But real usage doesn't look like that. WMB-100K tests 100,000 turns, with 3,134 questions across 5 difficulty levels. Also includes false memory probes — because "I d