Apr-16-2018, 07:09 PM
Hey,
I'm pretty new to python and still learning everyday. I'm kind of stuck for what i want to accomplish with my python script.
What i'm trying to do is ocr images in batch and then save that data into data.txt and after than i would like to rewrite the images with the ocr data...
so for example i have this image named 'dog-mask.jpg' and after ocr has been over it i would like to rewrite the image filename into this for example to 'no one cared who I was until I put on the mask.jpg'
![[Image: No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg]](https://lolpics.com/wp-content/uploads/2018/04/No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg)
The ocr part seems to work fine but i have no idea how to set new image files names with data from data.txt
Could anyone help me out please, i would really appreciate it if it is not too much trouble
Below is the code of my ocr script
I'm pretty new to python and still learning everyday. I'm kind of stuck for what i want to accomplish with my python script.
What i'm trying to do is ocr images in batch and then save that data into data.txt and after than i would like to rewrite the images with the ocr data...
so for example i have this image named 'dog-mask.jpg' and after ocr has been over it i would like to rewrite the image filename into this for example to 'no one cared who I was until I put on the mask.jpg'
![[Image: No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg]](https://lolpics.com/wp-content/uploads/2018/04/No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg)
The ocr part seems to work fine but i have no idea how to set new image files names with data from data.txt
Could anyone help me out please, i would really appreciate it if it is not too much trouble
Below is the code of my ocr script
import pytesseract import os from PIL import Image import re pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe' # path of tesseract path = 'C:\\users\\kevin\\downloads\\downloads' # path of image folder # function to convert image to text and return type: string def ocr(file_to_ocr): im = Image.open(path+"\\"+file_to_ocr) txt=pytesseract.image_to_string(im) return txt file_list = os.listdir(path) # file names in list (not sorted) directory = os.path.join(path) # path for storing the text file # function to sort the file names in order of numerical value present in it def atoi(text): return int(text) if text.isdigit() else text def natural_keys(text): ''' alist.sort(key=natural_keys) sorts in human order http://nedbatchelder.com/blog/200712/human_sorting.html (See Toothy's implementation in the comments) ''' return [ atoi(c) for c in re.split('(\d+)', text) ] file_list.sort(key=natural_keys) # file names in list (sorted) # for every files in the folder for file in file_list: # selecting image file type if file.endswith(".jpg"): txt=ocr(file) # calling the ocr function # appending the text into the file with open(directory+"\\"+'data'+".txt",'a+') as f: f.write("\n") f.write(file) f.write("\n") f.write('-----------------------------------------') f.write("\n") f.write('!!!Start!!!') f.write("\n") f.write(str(txt)) f.write("\n") f.write('!!!End!!!') f.write("\n") f.write('-----------------------------------------') f.write("\n") print("Image Conversion completed")