Best AI News — Updated Every 3 Hours
Story Page
← All Stories
Home Community Story
Community

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm)

Via r/LocalLlama
Tuesday, Mar 24, 2026 · 9:57PM
Summary

Running Qwen3.5-397B-A17B (IQ2_XXS, 107GB, 4 GGUF shards) at 17-19 tok/s generation and **25-33 tok/s prompt processing** on a single AMD Ryzen AI Max+ 395 with 128GB unified memory. All 61 layers offloaded to the integrated Radeon 8060S GPU. Total hardware cost: ~$2,500. ​The setup: - AMD Ryzen AI

Continue reading the full article
Read at r/LocalLlama
www.reddit.com
Back to all stories