Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
OCR again
#14
Alright, I see it that way. Simplified:

def convert_img(img_obj):
    return new_img # not  written to the disk but in memory

def do_ocr(image_data):
    return document 

def worker(path):
    with open(path, 'br') as file_obj:
        converted = convert_img(file_obj)

    document = do_ocr(converted)

    with open(path, 'w') as doc:
        doc.write(document)

images = pathlib.Path('path_to_folder').glob('**/*.tif') # recusively. returns a generator

with concurrent.futures.ThreadPoolExecutor() as executor:
    _ = executor.map(worker, images)

Should it work ? I think it's IO bound so threads are used here.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Messages In This Thread
OCR again - by DPaul - Oct-29-2022, 06:49 AM
RE: OCR again - by Gribouillis - Oct-29-2022, 08:16 AM
RE: OCR again - by DPaul - Oct-29-2022, 08:39 AM
RE: OCR again - by Gribouillis - Oct-29-2022, 09:08 AM
RE: OCR again - by DPaul - Oct-29-2022, 09:33 AM
RE: OCR again - by Gribouillis - Oct-29-2022, 10:08 AM
RE: OCR again - by DPaul - Oct-29-2022, 10:17 AM
RE: OCR again - by DPaul - Oct-30-2022, 06:26 AM
RE: OCR again - by DPaul - Oct-30-2022, 07:36 AM
RE: OCR again - by wavic - Oct-31-2022, 08:41 AM
RE: OCR again - by DPaul - Oct-31-2022, 11:10 AM
RE: OCR again - by wavic - Oct-31-2022, 01:44 PM
RE: OCR again - by DPaul - Oct-31-2022, 04:31 PM
RE: OCR again - by wavic - Oct-31-2022, 05:51 PM
RE: OCR again - by DPaul - Oct-31-2022, 06:29 PM
RE: OCR again - by wavic - Oct-31-2022, 07:14 PM
RE: OCR again - by DPaul - Nov-01-2022, 06:44 AM
RE: OCR again - by DPaul - Nov-01-2022, 08:31 AM
RE: OCR again - by wavic - Nov-01-2022, 09:22 AM
RE: OCR again - by DPaul - Nov-01-2022, 10:14 AM
RE: OCR again - by DPaul - Nov-04-2022, 07:15 AM
RE: OCR again - by DPaul - Nov-05-2022, 08:05 AM
RE: OCR again - by DPaul - Nov-05-2022, 09:49 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020