Python Forum
Multithreading with queues - code optimization
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Multithreading with queues - code optimization
#1
Hi all,

I'm quite new to python and yesterday I invested some time in order to tryout the concept of multithreading with queues.

My idea was to come up with a programm which processes line based input from a queue file and processes the data multi-threaded.
I quickly noticed that reading and writing to a file isn't ideal (from a performance point of view). So I decided to load the content from the file into a set. As I also wanted to keep track of which items have been already processed, I tracked those items in another set as well.
I wanted to be able to interupt the program and still keep track of which items have been already processed and which are still sitting in the queue. So I decided to write the content from the sets back to the queue and process file.

So far I'm quite happy with what I got; but after some dryruns with sample data I developed doubts in my approach.

I did some test runs with like 100 lines of queue data... that did quite okay. However after feeding multiple thousands of input lines I noticed that python only was running with one thread and that processing speed degreated over time :(
Also I seem to have screwed the logic of how I pull from the queue in feed into the workers a little bit...

Anyway:
I just uploaded my stuff to github and would be happy if anyone would have a look at it and throw some suggestions for improvements towards me. I'm pretty sure that I made some poor decisions/implementations that experienced python gurus would have avoided right away.

https://github.com/h1v3s3c/multithread-queue/

Side note:
You might want to modify the do_action method. Currently I just tell my workers to connect to my local webservice (127.0.0.1) and do a simple HEAD request.

Cheers!
Reply
#2
The main problem is that you are writing in the queue file with all 8 threads. The GIL locks up the file while one thread is working on it and so all others are paused. Threads in python are not like threads in other programming languages.
Why do you write everything into the file and not using queues for handling the tasks given by spider.py?
If you pass the new targets to the main.py in a different manner, you should consider using multiprocessing to speed things up :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Portfolio Optimization code error i want to cry FinanceMotta 3 442 Mar-11-2024, 10:24 PM
Last Post: deanhystad
  multithreading Hanyx 4 1,340 Jul-29-2022, 07:28 AM
Last Post: Larz60+
Question Problems with variables in multithreading Wombaz 2 1,344 Mar-08-2022, 03:32 PM
Last Post: Wombaz
  Code optimization Vidar567 2 1,547 Nov-24-2020, 10:27 PM
Last Post: Vidar567
  Multithreading question amadeok 0 1,789 Oct-17-2020, 12:54 PM
Last Post: amadeok
  How can i add multithreading in this example WoodyWoodpecker1 3 2,520 Aug-11-2020, 05:30 PM
Last Post: deanhystad
  matplotlib multithreading catosp 0 2,965 Jul-03-2020, 09:33 AM
Last Post: catosp
  Multithreading dynamically syncronism Rodrigo 0 1,546 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 1,855 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  multithreading issue with output mr_byte31 4 3,221 Sep-11-2019, 12:04 PM
Last Post: stullis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020