Community

Seeking the Absolute Lowest Latency for Qwen 3.5 9B: Best Inference Engine for 1-Stream Real-Time TTS?

Via r/LocalLlama

Sunday, Mar 22, 2026 · 10:16PM

Summary

Hi everyone, I'm building a real-time voice chat pipeline (STT -> LLM -> TTS) and I’m hitting a bottleneck in the "Time to Sentence" part. My goal is to minimize the total latency for generating a 100-token response. My Requirements: * Model: Qwen 3.5 9B (currently testing FP16 and EXL3 quants). * H

Continue reading the full article

Read at r/LocalLlama

www.reddit.com