Community

We fit a 24M-parameter LLM into 15MB with per-row MSE quantization

Via r/LocalLlama

Wednesday, Mar 25, 2026 · 3:27AM

Summary

Working on OpenAI's Parameter Golf challenge (train best LLM possible, must fit in 16MB). Hit Top-3 on the leaderboard. The quantization trick: instead of fixed-percentile INT8 clipping, we search 5 clip values per weight row and keep whichever gives lowest reconstruction MSE. Costs 5x quantization

Continue reading the full article

Read at r/LocalLlama

www.reddit.com