Python Forum
How to rewrite image file name based on ocr data.txt - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to rewrite image file name based on ocr data.txt (/thread-9569.html)



How to rewrite image file name based on ocr data.txt - kevinchr - Apr-16-2018

Hey,

I'm pretty new to python and still learning everyday. I'm kind of stuck for what i want to accomplish with my python script.

What i'm trying to do is ocr images in batch and then save that data into data.txt and after than i would like to rewrite the images with the ocr data...

so for example i have this image named 'dog-mask.jpg' and after ocr has been over it i would like to rewrite the image filename into this for example to 'no one cared who I was until I put on the mask.jpg'

[Image: No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg]

The ocr part seems to work fine but i have no idea how to set new image files names with data from data.txt

Could anyone help me out please, i would really appreciate it if it is not too much trouble

Below is the code of my ocr script


import pytesseract
import os
from PIL import Image
import re

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe' # path of tesseract

path = 'C:\\users\\kevin\\downloads\\downloads' # path of image folder

# function to convert image to text and return type: string
def ocr(file_to_ocr):
    im = Image.open(path+"\\"+file_to_ocr)
    txt=pytesseract.image_to_string(im)
    return txt

file_list = os.listdir(path) # file names in list (not sorted)
directory = os.path.join(path) # path for storing the text file

# function to sort the file names in order of numerical value present in it
def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split('(\d+)', text) ]

file_list.sort(key=natural_keys) # file names in list (sorted)

# for every files in the folder
for file in file_list:
	# selecting image file type
    if file.endswith(".jpg"):
        txt=ocr(file) # calling the ocr function
	# appending the text into the file
        with open(directory+"\\"+'data'+".txt",'a+') as f:
            f.write("\n")
            f.write(file)
            f.write("\n")
            f.write('-----------------------------------------')
            f.write("\n")
            f.write('!!!Start!!!')
            f.write("\n")
            f.write(str(txt))
            f.write("\n")
            f.write('!!!End!!!')
            f.write("\n")
            f.write('-----------------------------------------')
            f.write("\n")
print("Image Conversion completed")