Python Forum
Tesseract-ocr ->iterator.WordFontAttributes() does not work
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Tesseract-ocr ->iterator.WordFontAttributes() does not work
#1
When i'm running each image that i have in my directory, my goal is to extract the text and see the text attributes. The text extraction works but then when i'm going to know the text attributes with the PyTessBaseAPI() api for some reasons, some of my images don't recognize the text attributes and it gives in the python shell "=============================== RESTART: Shell =============================== "

Here is the code:

for i, cnt in enumerate(contours):
    x,y,w,h = cv2.boundingRect(cnt)

    x = x - 3
    y = y - 3

    if x < 0 or y < 0:
            continue

    cropped = image_file[y: y+h+padding, x: x+w+padding]
    #     make image bigger to recgnize better the text
    cropped = cv2.resize(cropped, (0,0), fx=4.0, fy=4.0)
    #     CONVERT NUMPY ARRAY TO PIL IMAGE 
    im = Image.fromarray(cropped.astype('uint8'), 'RGB')
    im = im.filter(ImageFilter.SHARPEN())

    text = image_to_string(im)
    #print(text)
    if text != "":
        #print("OCR Output : " + image_to_string(im))
        cv2.imwrite("img_text/cropped"+str(i)+".png", cropped)

        path = os.path.abspath("img_text/cropped"+str(i)+".png")
        with PyTessBaseAPI() as api:
            #img = Image.open(path, mode='r')
            bytes = readimage(path)
            img = Image.open(io.BytesIO(bytes))
            api.SetImage(img)
            api.Recognize()  # required to get result from the next line
            iterator = api.GetIterator()
            #print(iterator.WordFontAttributes())
            dict = iterator.WordFontAttributes()
            #print(dict['font_name'])
Does anybody knows what i'm doing wrong here?

Thanks
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Adding Columns to CSV using iterator pstarrett 10 27,372 Jan-22-2018, 02:37 AM
Last Post: pstarrett

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020