Best AI News — Updated Every 3 Hours
Story Page
← All Stories
Home Community Story
Community

Qwen3.5-9B.Q4_K_M on RTX 3070 Mobile (8GB) with ik_llama.cpp — optimization findings + ~50 t/s gen speed, looking for tips

Via r/LocalLlama
Saturday, Mar 21, 2026 · 5:41PM
Summary

Disclouse: This post partly written with the help of Claude Opus 4.6 to help with gathering the info and making it understandable for myself first and foremost.... and this post etc! Hi! Been tuning local inference on my laptop and wanted to share some info reallyu because some of it surprised me. W

Continue reading the full article
Read at r/LocalLlama
www.reddit.com
Back to all stories