Just for whoever might find it useful, I recently converted over from base setup llama.cpp to Lemonade SDK on my AMD Strix Halo and it instantly feels so much better. I’m seeing on average 20% bumps in tokens per second running the same models on the same hardware. AMD specific, and might take some