Community

Run Qwen3.5 flagship model with 397 billion parameters at 5 – 9 tok/s on a $2,100 desktop! Two $500 GPUs, 32GB RAM, one NVMe drive. Uses Q4_K_M quants

Via r/LocalLlama

Monday, Mar 23, 2026 · 10:54PM

Summary

Introducing FOMOE: Fast Opportunistic Mixture Of Experts (pronounced fomo). The problem: Large Mixture of Experts (MoEs) need a lot of memory for weights (hundreds of GBs), which are typically stored in flash memory (eg NVMe). During inference, only a small fraction of these weights are needed, howe

Continue reading the full article

Read at r/LocalLlama

www.reddit.com