How to analyze a 300ms delay issue in VLLM

SkyLee · Aug-28-2024, 07:02 AM

When we were conducting stress testing on vllm with a load of 30QPS, we found an anomaly with a 300ms delay, which occurred more frequently as the QPS increased. Looking from nsight, one thread's utilization rate was at 100%, but the Python stack was empty.
The input token for the experimental data was 40, and the output token was 20.
This situation is very strange.
For more specific experimental data, please see Smile

https://github.com/vllm-project/vllm/issues/7540

SkyLee · Aug-28-2024, 07:07 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	PyQt5 - issue of delay in overall performance & user interface while using serial COM	thiru	0	867	Jun-18-2024, 08:34 AM Last Post: thiru
	Is it possible to add a delay right after a request.get()	cubangt	6	11,330	Sep-07-2023, 09:29 AM Last Post: shoesinquiry
	Get image from PI camera and analyze it	korenron	0	1,771	Apr-28-2022, 06:49 AM Last Post: korenron
	Request Delay	pheadrus	1	5,332	Nov-25-2021, 08:51 PM Last Post: snippsat
	adding a delay on end	Daz2264	6	3,862	Sep-29-2021, 02:57 PM Last Post: deanhystad
	python delay without interrupt the whole code	Nick_tkinter	4	7,815	Feb-22-2021, 10:51 PM Last Post: nilamo
	analyze list	davidm	5	4,379	Dec-03-2020, 03:42 PM Last Post: Larz60+
	configure delay on only one link using python3	HiImAl	3	3,813	Oct-21-2020, 07:51 PM Last Post: buran
	Keyboard commands and delay/latency	RungJa	0	2,965	Mar-29-2020, 01:28 PM Last Post: RungJa
	Vpython Delay in plotting points	SohaibAJ	0	2,601	Jul-30-2018, 08:44 PM Last Post: SohaibAJ

How to analyze a 300ms delay issue in VLLM

User Panel Messages

Announcements