Multithreading with queues - code optimization

h1v3s3c · (This post was last modified: May-10-2018, 09:13 AM by h1v3s3c.)

Hi all,

I'm quite new to python and yesterday I invested some time in order to tryout the concept of multithreading with queues.

My idea was to come up with a programm which processes line based input from a queue file and processes the data multi-threaded.
I quickly noticed that reading and writing to a file isn't ideal (from a performance point of view). So I decided to load the content from the file into a set. As I also wanted to keep track of which items have been already processed, I tracked those items in another set as well.
I wanted to be able to interupt the program and still keep track of which items have been already processed and which are still sitting in the queue. So I decided to write the content from the sets back to the queue and process file.

So far I'm quite happy with what I got; but after some dryruns with sample data I developed doubts in my approach.

I did some test runs with like 100 lines of queue data... that did quite okay. However after feeding multiple thousands of input lines I noticed that python only was running with one thread and that processing speed degreated over time :(
Also I seem to have screwed the logic of how I pull from the queue in feed into the workers a little bit...

Anyway:
I just uploaded my stuff to github and would be happy if anyone would have a look at it and throw some suggestions for improvements towards me. I'm pretty sure that I made some poor decisions/implementations that experienced python gurus would have avoided right away.

https://github.com/h1v3s3c/multithread-queue/

Side note:
You might want to modify the do_action method. Currently I just tell my workers to connect to my local webservice (127.0.0.1) and do a simple HEAD request.

Cheers!

ThiefOfTime · May-10-2018, 10:40 AM

The main problem is that you are writing in the queue file with all 8 threads. The GIL locks up the file while one thread is working on it and so all others are paused. Threads in python are not like threads in other programming languages.
Why do you write everything into the file and not using queues for handling the tasks given by spider.py?
If you pass the new targets to the main.py in a different manner, you should consider using multiprocessing to speed things up :)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Portfolio Optimization code error i want to cry	FinanceMotta	3	479	Mar-11-2024, 10:24 PM Last Post: deanhystad
	multithreading	Hanyx	4	1,360	Jul-29-2022, 07:28 AM Last Post: Larz60+
	Problems with variables in multithreading	Wombaz	2	1,363	Mar-08-2022, 03:32 PM Last Post: Wombaz
	Code optimization	Vidar567	2	1,563	Nov-24-2020, 10:27 PM Last Post: Vidar567
	Multithreading question	amadeok	0	1,799	Oct-17-2020, 12:54 PM Last Post: amadeok
	How can i add multithreading in this example	WoodyWoodpecker1	3	2,539	Aug-11-2020, 05:30 PM Last Post: deanhystad
	matplotlib multithreading	catosp	0	2,976	Jul-03-2020, 09:33 AM Last Post: catosp
	Multithreading dynamically syncronism	Rodrigo	0	1,555	Nov-08-2019, 02:33 AM Last Post: Rodrigo
	Locks in Multithreading	Chuonon	0	1,869	Oct-03-2019, 04:16 PM Last Post: Chuonon
	multithreading issue with output	mr_byte31	4	3,244	Sep-11-2019, 12:04 PM Last Post: stullis

Multithreading with queues - code optimization

User Panel Messages

Announcements