Python Forum
How to analyze a 300ms delay issue in VLLM
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to analyze a 300ms delay issue in VLLM
#1
Thumbs Up 
When we were conducting stress testing on vllm with a load of 30QPS, we found an anomaly with a 300ms delay, which occurred more frequently as the QPS increased. Looking from nsight, one thread's utilization rate was at 100%, but the Python stack was empty.
The input token for the experimental data was 40, and the output token was 20.
This situation is very strange.
For more specific experimental data, please see Smile https://github.com/vllm-project/vllm/issues/7540
Reply
#2
Smile Wink Cool Big Grin
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  PyQt5 - issue of delay in overall performance & user interface while using serial COM thiru 0 778 Jun-18-2024, 08:34 AM
Last Post: thiru
  Is it possible to add a delay right after a request.get() cubangt 6 10,998 Sep-07-2023, 09:29 AM
Last Post: shoesinquiry
  Get image from PI camera and analyze it korenron 0 1,728 Apr-28-2022, 06:49 AM
Last Post: korenron
  Request Delay pheadrus 1 5,203 Nov-25-2021, 08:51 PM
Last Post: snippsat
  adding a delay on end Daz2264 6 3,790 Sep-29-2021, 02:57 PM
Last Post: deanhystad
  python delay without interrupt the whole code Nick_tkinter 4 7,591 Feb-22-2021, 10:51 PM
Last Post: nilamo
  analyze list davidm 5 4,300 Dec-03-2020, 03:42 PM
Last Post: Larz60+
  configure delay on only one link using python3 HiImAl 3 3,714 Oct-21-2020, 07:51 PM
Last Post: buran
  Keyboard commands and delay/latency RungJa 0 2,892 Mar-29-2020, 01:28 PM
Last Post: RungJa
  Vpython Delay in plotting points SohaibAJ 0 2,555 Jul-30-2018, 08:44 PM
Last Post: SohaibAJ

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020