Oct-31-2022, 05:51 PM
Alright, I see it that way. Simplified:
Should it work ? I think it's IO bound so threads are used here.
def convert_img(img_obj): return new_img # not written to the disk but in memory def do_ocr(image_data): return document def worker(path): with open(path, 'br') as file_obj: converted = convert_img(file_obj) document = do_ocr(converted) with open(path, 'w') as doc: doc.write(document) images = pathlib.Path('path_to_folder').glob('**/*.tif') # recusively. returns a generator with concurrent.futures.ThreadPoolExecutor() as executor: _ = executor.map(worker, images)
Should it work ? I think it's IO bound so threads are used here.