Python Forum
how to extract tiff images from the subfolder into. hocr format in another similar su - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: how to extract tiff images from the subfolder into. hocr format in another similar su (/thread-36402.html)



how to extract tiff images from the subfolder into. hocr format in another similar su - JOE - Feb-16-2022

HI,
I am working on a project to OCR text from tiff images, the below code works fine on individual images, but I am looking for a solution where I can extract the batch images from respective subfolders and OCR in .HOCR format.

Example :

There are several subfolders in the D drive with Tiff image, which needs to pass through OCR one by one and output in E drive with the similar DIR tree as the D drive.
D:\\subfolder\Subfolder1\tiff image to E:\subfolder\Subfolder1\Hocr image
Please suggest how to tweak the code to achieve the requirement

My code
from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract- OCR\tesseract.exe"

image = Image.open(r"C:\Users\multipage.tiff")

config = ("--oem 3 --psm 6")

txt = ''
for frame in range(image.n_frames):
    image.seek(frame)
    txt += pytesseract.image_to_string(image, config = config, lang='eng') + '\n'

print(txt)
with open(r"C:\Users\multipage_output.txt", mode = 'w') as f:
    f.write(txt)
Thanks!
Joe