Python Forum
How to analyze a 300ms delay issue in VLLM - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to analyze a 300ms delay issue in VLLM (/thread-43001.html)



How to analyze a 300ms delay issue in VLLM - SkyLee - Aug-28-2024

When we were conducting stress testing on vllm with a load of 30QPS, we found an anomaly with a 300ms delay, which occurred more frequently as the QPS increased. Looking from nsight, one thread's utilization rate was at 100%, but the Python stack was empty.
The input token for the experimental data was 40, and the output token was 20.
This situation is very strange.
For more specific experimental data, please see Smile https://github.com/vllm-project/vllm/issues/7540


RE: How to analyze a 300ms delay issue in VLLM - SkyLee - Aug-28-2024

Smile Wink Cool Big Grin