Mult-threading and locking file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Mult-threading and locking file (/thread-35276.html) |
Mult-threading and locking file - mr_byte31 - Oct-15-2021 I had a code which is running correctly but I feel it is not a professional code. I want your opinion to enhance it. Code : I create a lot of threads to web scrap some information that I am interested in it. I have 5000 threads to be triggered but I run them 50 in queue. from queue import * q = Queue(maxsize=50)Target : I collect all the information and save it in CSV file. First project: I created dataframe using panda. A lot of threads were updating different entries of dataframe (but no thread modify an entry of other thread) and then I noticed that some enteries are not filled correctly. I learnt that dataframe are not secure with multhreading. Second project: I made each thread to create CSV file (part of information) and after all threads finish working. I run code to collect all the CSV files in one file. it is running now perfectly. Third project: I didn't implement this solution but I was thinking to keep the threading system and create one Sqllite DB and lock it for each thread. I was wondering what would be the most professional way to do the threading and and storing in sqlite via multiple threads. RE: Mult-threading and locking file - SamHobbs - Oct-15-2021 I think you just need to create a thread (one and only one) that does the writing. Access to the thread from other threads need to be synchronized. I am a beginner to Python; others can guide you better but probably you can use a lock to do the synchronization. RE: Mult-threading and locking file - Larz60+ - Oct-15-2021 I wrote a tutorial on automatically creating a database from csv files. Note: Project uses sqlite database. Other DBMS can be used with a simple change to the model. The project demonstrates dynamic model creation, and subsequent database load, achieving the following:
to show the simplicity of the code, here's the database load module: from FlightPaths import FlightPaths import pandas as pd import DbModel import csv class LoadDatabase: def __init__(self): self.fpath = FlightPaths() self.Model = DbModel def dispatch(self): self.load_all() def load_all(self): datafiles = self.fpath.ourairport_datafiles filelist = [fn for fn in datafiles.iterdir() if fn.is_file() and fn.suffix == '.csv'] for fn in filelist: tablename = fn.stem.replace('-', '_') df = pd.read_csv(fn) print(f"engine: {self.Model.engine}") df.to_sql(tablename, con=self.Model.engine, if_exists='replace') def main(): lDb = LoadDatabase() lDb.load_all() if __name__ == '__main__': main()If you are interested, you will find the tutorial here: https://python-forum.io/thread-33843.html RE: Mult-threading and locking file - mr_byte31 - Oct-15-2021 (Oct-15-2021, 06:06 PM)Larz60+ Wrote: I wrote a tutorial on automatically creating a database from csv files. Thanks a lot. I was doing this work manually. This helps. What about the threading system? RE: Mult-threading and locking file - Larz60+ - Oct-16-2021 I'm not an expert on threading, have used mutiprocessing see: https://docs.python.org/3/library/multiprocessing.html. Multiprocessing is similar to threading, except it side steps the GIL which can be a problem when threading. Sorry I cannot give you in depth details as I haven't used either is quite some time. I think locking the database can get you in trouble, however if you switch to Postgresql, you can use row level locking which only locks the row you are using, and that in most cases is quite safe. |