Best AI News — Updated Every 3 Hours
Story Page
← All Stories
Home Community Story
Community

PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed

Via r/LocalLlama
Tuesday, Mar 24, 2026 · 5:50PM
Summary

If you run Ollama, vLLM, TGI, or any custom model server that loads and unloads models, you've probably seen RSS creep up over hours until Linux kills the process. It's not a Python leak. It's not PyTorch. It's glibc's heap allocator fragmenting and never returning pages to the OS. Fix: export MALLO

Continue reading the full article
Read at r/LocalLlama
www.reddit.com
Back to all stories