Community

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm)

Via r/LocalLlama

Tuesday, Mar 24, 2026 · 9:57PM

Summary

Running Qwen3.5-397B-A17B (IQ2_XXS, 107GB, 4 GGUF shards) at 17-19 tok/s generation and **25-33 tok/s prompt processing** on a single AMD Ryzen AI Max+ 395 with 128GB unified memory. All 61 layers offloaded to the integrated Radeon 8060S GPU. Total hardware cost: ~$2,500. The setup: - AMD Ryzen AI

Continue reading the full article

Read at r/LocalLlama

www.reddit.com