Community

[R] Evaluating MLLMs with Child-Inspired Cognitive Tasks

Via r/MachineLearning

Tuesday, Mar 24, 2026 · 12:39PM

Summary

Hey there, we’re sharing KidGym, an interactive 2D grid-based benchmark for evaluating MLLMs in continuous, trajectory-based interaction, accepted to ICLR 2026. Motivation: Many existing MLLM benchmarks are static and focus on isolated skills, which makes them less faithful for characterizing model

Continue reading the full article

Read at r/MachineLearning

www.reddit.com