Hi all,
I'm quite new to python and yesterday I invested some time in order to tryout the concept of multithreading with queues.
My idea was to come up with a programm which processes line based input from a queue file and processes the data multi-threaded.
I quickly noticed that reading and writing to a file isn't ideal (from a performance point of view). So I decided to load the content from the file into a set. As I also wanted to keep track of which items have been already processed, I tracked those items in another set as well.
I wanted to be able to interupt the program and still keep track of which items have been already processed and which are still sitting in the queue. So I decided to write the content from the sets back to the queue and process file.
So far I'm quite happy with what I got; but after some dryruns with sample data I developed doubts in my approach.
I did some test runs with like 100 lines of queue data... that did quite okay. However after feeding multiple thousands of input lines I noticed that python only was running with one thread and that processing speed degreated over time :(
Also I seem to have screwed the logic of how I pull from the queue in feed into the workers a little bit...
Anyway:
I just uploaded my stuff to github and would be happy if anyone would have a look at it and throw some suggestions for improvements towards me. I'm pretty sure that I made some poor decisions/implementations that experienced python gurus would have avoided right away.
https://github.com/h1v3s3c/multithread-queue/
Side note:
You might want to modify the do_action method. Currently I just tell my workers to connect to my local webservice (127.0.0.1) and do a simple HEAD request.
Cheers!
I'm quite new to python and yesterday I invested some time in order to tryout the concept of multithreading with queues.
My idea was to come up with a programm which processes line based input from a queue file and processes the data multi-threaded.
I quickly noticed that reading and writing to a file isn't ideal (from a performance point of view). So I decided to load the content from the file into a set. As I also wanted to keep track of which items have been already processed, I tracked those items in another set as well.
I wanted to be able to interupt the program and still keep track of which items have been already processed and which are still sitting in the queue. So I decided to write the content from the sets back to the queue and process file.
So far I'm quite happy with what I got; but after some dryruns with sample data I developed doubts in my approach.
I did some test runs with like 100 lines of queue data... that did quite okay. However after feeding multiple thousands of input lines I noticed that python only was running with one thread and that processing speed degreated over time :(
Also I seem to have screwed the logic of how I pull from the queue in feed into the workers a little bit...
Anyway:
I just uploaded my stuff to github and would be happy if anyone would have a look at it and throw some suggestions for improvements towards me. I'm pretty sure that I made some poor decisions/implementations that experienced python gurus would have avoided right away.
https://github.com/h1v3s3c/multithread-queue/
Side note:
You might want to modify the do_action method. Currently I just tell my workers to connect to my local webservice (127.0.0.1) and do a simple HEAD request.
Cheers!