Python Forum
Multithreading with queues - code optimization
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Multithreading with queues - code optimization
#1
Hi all,

I'm quite new to python and yesterday I invested some time in order to tryout the concept of multithreading with queues.

My idea was to come up with a programm which processes line based input from a queue file and processes the data multi-threaded.
I quickly noticed that reading and writing to a file isn't ideal (from a performance point of view). So I decided to load the content from the file into a set. As I also wanted to keep track of which items have been already processed, I tracked those items in another set as well.
I wanted to be able to interupt the program and still keep track of which items have been already processed and which are still sitting in the queue. So I decided to write the content from the sets back to the queue and process file.

So far I'm quite happy with what I got; but after some dryruns with sample data I developed doubts in my approach.

I did some test runs with like 100 lines of queue data... that did quite okay. However after feeding multiple thousands of input lines I noticed that python only was running with one thread and that processing speed degreated over time :(
Also I seem to have screwed the logic of how I pull from the queue in feed into the workers a little bit...

Anyway:
I just uploaded my stuff to github and would be happy if anyone would have a look at it and throw some suggestions for improvements towards me. I'm pretty sure that I made some poor decisions/implementations that experienced python gurus would have avoided right away.

https://github.com/h1v3s3c/multithread-queue/

Side note:
You might want to modify the do_action method. Currently I just tell my workers to connect to my local webservice (127.0.0.1) and do a simple HEAD request.

Cheers!
Reply
#2
The main problem is that you are writing in the queue file with all 8 threads. The GIL locks up the file while one thread is working on it and so all others are paused. Threads in python are not like threads in other programming languages.
Why do you write everything into the file and not using queues for handling the tasks given by spider.py?
If you pass the new targets to the main.py in a different manner, you should consider using multiprocessing to speed things up :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Portfolio Optimization code error i want to cry FinanceMotta 3 437 Mar-11-2024, 10:24 PM
Last Post: deanhystad
  multithreading Hanyx 4 1,338 Jul-29-2022, 07:28 AM
Last Post: Larz60+
Question Problems with variables in multithreading Wombaz 2 1,343 Mar-08-2022, 03:32 PM
Last Post: Wombaz
  Code optimization Vidar567 2 1,547 Nov-24-2020, 10:27 PM
Last Post: Vidar567
  Multithreading question amadeok 0 1,788 Oct-17-2020, 12:54 PM
Last Post: amadeok
  How can i add multithreading in this example WoodyWoodpecker1 3 2,518 Aug-11-2020, 05:30 PM
Last Post: deanhystad
  matplotlib multithreading catosp 0 2,958 Jul-03-2020, 09:33 AM
Last Post: catosp
  Multithreading dynamically syncronism Rodrigo 0 1,543 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 1,853 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  multithreading issue with output mr_byte31 4 3,216 Sep-11-2019, 12:04 PM
Last Post: stullis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020