Hi everyone, I'm building a real-time voice chat pipeline (STT -> LLM -> TTS) and I’m hitting a bottleneck in the "Time to Sentence" part. My goal is to minimize the total latency for generating a 100-token response. My Requirements: * Model: Qwen 3.5 9B (currently testing FP16 and EXL3 quants). * H