Best AI News — Updated Every 3 Hours
Story Page
← All Stories
Home Community Story
Community

Run Qwen3.5 flagship model with 397 billion parameters at 5 – 9 tok/s on a $2,100 desktop! Two $500 GPUs, 32GB RAM, one NVMe drive. Uses Q4_K_M quants

Via r/LocalLlama
Monday, Mar 23, 2026 · 10:54PM
Summary

Introducing FOMOE: Fast Opportunistic Mixture Of Experts (pronounced fomo). The problem: Large Mixture of Experts (MoEs) need a lot of memory for weights (hundreds of GBs), which are typically stored in flash memory (eg NVMe). During inference, only a small fraction of these weights are needed, howe

Continue reading the full article
Read at r/LocalLlama
www.reddit.com
Back to all stories