I had a code which is running correctly but I feel it is not a professional code. I want your opinion to enhance it.
Code :
I create a lot of threads to web scrap some information that I am interested in it.
I have 5000 threads to be triggered but I run them 50 in queue.
First project: I created dataframe using panda. A lot of threads were updating different entries of dataframe (but no thread modify an entry of other thread) and then I noticed that some enteries are not filled correctly. I learnt that dataframe are not secure with multhreading.
Second project: I made each thread to create CSV file (part of information) and after all threads finish working. I run code to collect all the CSV files in one file. it is running now perfectly.
Third project: I didn't implement this solution but I was thinking to keep the threading system and create one Sqllite DB and lock it for each thread.
I was wondering what would be the most professional way to do the threading and and storing in sqlite via multiple threads.
Code :
I create a lot of threads to web scrap some information that I am interested in it.
I have 5000 threads to be triggered but I run them 50 in queue.
from queue import * q = Queue(maxsize=50)Target : I collect all the information and save it in CSV file.
First project: I created dataframe using panda. A lot of threads were updating different entries of dataframe (but no thread modify an entry of other thread) and then I noticed that some enteries are not filled correctly. I learnt that dataframe are not secure with multhreading.
Second project: I made each thread to create CSV file (part of information) and after all threads finish working. I run code to collect all the CSV files in one file. it is running now perfectly.
Third project: I didn't implement this solution but I was thinking to keep the threading system and create one Sqllite DB and lock it for each thread.
I was wondering what would be the most professional way to do the threading and and storing in sqlite via multiple threads.