how to extract tiff images from the subfolder into. hocr format in another similar su - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: how to extract tiff images from the subfolder into. hocr format in another similar su (/thread-36402.html) |
how to extract tiff images from the subfolder into. hocr format in another similar su - JOE - Feb-16-2022 HI, I am working on a project to OCR text from tiff images, the below code works fine on individual images, but I am looking for a solution where I can extract the batch images from respective subfolders and OCR in .HOCR format. Example : There are several subfolders in the D drive with Tiff image, which needs to pass through OCR one by one and output in E drive with the similar DIR tree as the D drive. D:\\subfolder\Subfolder1\tiff image to E:\subfolder\Subfolder1\Hocr image Please suggest how to tweak the code to achieve the requirement My code from PIL import Image import pytesseract pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract- OCR\tesseract.exe" image = Image.open(r"C:\Users\multipage.tiff") config = ("--oem 3 --psm 6") txt = '' for frame in range(image.n_frames): image.seek(frame) txt += pytesseract.image_to_string(image, config = config, lang='eng') + '\n' print(txt) with open(r"C:\Users\multipage_output.txt", mode = 'w') as f: f.write(txt)Thanks! Joe |