Python Forum
Multithreading with queues - code optimization - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Multithreading with queues - code optimization (/thread-10039.html)



Multithreading with queues - code optimization - h1v3s3c - May-10-2018

Hi all,

I'm quite new to python and yesterday I invested some time in order to tryout the concept of multithreading with queues.

My idea was to come up with a programm which processes line based input from a queue file and processes the data multi-threaded.
I quickly noticed that reading and writing to a file isn't ideal (from a performance point of view). So I decided to load the content from the file into a set. As I also wanted to keep track of which items have been already processed, I tracked those items in another set as well.
I wanted to be able to interupt the program and still keep track of which items have been already processed and which are still sitting in the queue. So I decided to write the content from the sets back to the queue and process file.

So far I'm quite happy with what I got; but after some dryruns with sample data I developed doubts in my approach.

I did some test runs with like 100 lines of queue data... that did quite okay. However after feeding multiple thousands of input lines I noticed that python only was running with one thread and that processing speed degreated over time :(
Also I seem to have screwed the logic of how I pull from the queue in feed into the workers a little bit...

Anyway:
I just uploaded my stuff to github and would be happy if anyone would have a look at it and throw some suggestions for improvements towards me. I'm pretty sure that I made some poor decisions/implementations that experienced python gurus would have avoided right away.

https://github.com/h1v3s3c/multithread-queue/

Side note:
You might want to modify the do_action method. Currently I just tell my workers to connect to my local webservice (127.0.0.1) and do a simple HEAD request.

Cheers!


RE: Multithreading with queues - code optimization - ThiefOfTime - May-10-2018

The main problem is that you are writing in the queue file with all 8 threads. The GIL locks up the file while one thread is working on it and so all others are paused. Threads in python are not like threads in other programming languages.
Why do you write everything into the file and not using queues for handling the tasks given by spider.py?
If you pass the new targets to the main.py in a different manner, you should consider using multiprocessing to speed things up :)