Python Forum
Tesseract-ocr ->iterator.WordFontAttributes() does not work - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Tesseract-ocr ->iterator.WordFontAttributes() does not work (/thread-12583.html)



Tesseract-ocr ->iterator.WordFontAttributes() does not work - Maia07 - Sep-01-2018

When i'm running each image that i have in my directory, my goal is to extract the text and see the text attributes. The text extraction works but then when i'm going to know the text attributes with the PyTessBaseAPI() api for some reasons, some of my images don't recognize the text attributes and it gives in the python shell "=============================== RESTART: Shell =============================== "

Here is the code:

for i, cnt in enumerate(contours):
    x,y,w,h = cv2.boundingRect(cnt)

    x = x - 3
    y = y - 3

    if x < 0 or y < 0:
            continue

    cropped = image_file[y: y+h+padding, x: x+w+padding]
    #     make image bigger to recgnize better the text
    cropped = cv2.resize(cropped, (0,0), fx=4.0, fy=4.0)
    #     CONVERT NUMPY ARRAY TO PIL IMAGE 
    im = Image.fromarray(cropped.astype('uint8'), 'RGB')
    im = im.filter(ImageFilter.SHARPEN())

    text = image_to_string(im)
    #print(text)
    if text != "":
        #print("OCR Output : " + image_to_string(im))
        cv2.imwrite("img_text/cropped"+str(i)+".png", cropped)

        path = os.path.abspath("img_text/cropped"+str(i)+".png")
        with PyTessBaseAPI() as api:
            #img = Image.open(path, mode='r')
            bytes = readimage(path)
            img = Image.open(io.BytesIO(bytes))
            api.SetImage(img)
            api.Recognize()  # required to get result from the next line
            iterator = api.GetIterator()
            #print(iterator.WordFontAttributes())
            dict = iterator.WordFontAttributes()
            #print(dict['font_name'])
Does anybody knows what i'm doing wrong here?

Thanks