Community

Qwen3.5-9B.Q4_K_M on RTX 3070 Mobile (8GB) with ik_llama.cpp — optimization findings + ~50 t/s gen speed, looking for tips

Via r/LocalLlama

Saturday, Mar 21, 2026 · 5:41PM

Summary

Disclouse: This post partly written with the help of Claude Opus 4.6 to help with gathering the info and making it understandable for myself first and foremost.... and this post etc! Hi! Been tuning local inference on my laptop and wanted to share some info reallyu because some of it surprised me. W

Continue reading the full article

Read at r/LocalLlama

www.reddit.com