Python Forum
Error when executing pytesseract to get the text from image
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Error when executing pytesseract to get the text from image
#1
Hi Iam having issue geeting text from scanned image using pytesseract. Please help me

Here is the code

from wand.image import Image as Img
from PIL import Image
import pytesseract
import cv2

with Img(filename="JRF-DEO.pdf", resolution=300) as img:
 img.compression_quality = 99
 img.save(filename="sample_scan.jpg")

text = pytesseract.image_to_string(Image.open('sample_scan.jpg'))
Got below error, but i have already installed tesseract in the system, configured environment valiable to tesseract path, pytesseract and tesseract both are in same path

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
e:\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice)
    183     try:
--> 184         proc = subprocess.Popen(cmd_args, **subprocess_args())
    185     except OSError:

e:\programs\python\python36\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    708                                 errread, errwrite,
--> 709                                 restore_signals, start_new_session)
    710         except:

e:\programs\python\python36\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
    996                                          os.fspath(cwd) if cwd is not None else None,
--> 997                                          startupinfo)
    998             finally:

FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

TesseractNotFoundError                    Traceback (most recent call last)
<ipython-input-4-2c509bfb5784> in <module>()
----> 1 text = pytesseract.image_to_string(Image.open('sample_scan.jpg'))

e:\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type)
    307         Output.DICT: lambda: {'text': run_and_get_output(*args)},
    308         Output.STRING: lambda: run_and_get_output(*args),
--> 309     }[output_type]()
    310 
    311 

e:\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    306         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    307         Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 308         Output.STRING: lambda: run_and_get_output(*args),
    309     }[output_type]()
    310 

e:\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, return_bytes)
    216         }
    217 
--> 218         run_tesseract(**kwargs)
    219         filename = kwargs['output_filename_base'] + os.extsep + extension
    220         with open(filename, 'rb') as output_file:

e:\programs\python\python36\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice)
    184         proc = subprocess.Popen(cmd_args, **subprocess_args())
    185     except OSError:
--> 186         raise TesseractNotFoundError()
    187 
    188     status_code, error_string = proc.wait(), proc.stderr.read()

TesseractNotFoundError: tesseract is not installed or it's not in your path
Reply
#2
Hello, the error message is on the very last line of your output (#54). Try if you find an answer here, or give searching a try.
Reply
#3
added
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
in the script
Reply
#4
hello my friend,I meet some trouble,It is my code
import pytesseract
from os.path import abspath
from PIL import Image
tessdata_dir_config = '--tessdata-dir“ d:\\Tesseract-OCR\\tessdata” '

text = pytesseract.image_to_string(Image.open(abspath('decode/1.jpg')), config=tessdata_dir_config)
print(text)
but It raise error like this:
File "D:\python\lib\site-packages\pytesseract\pytesseract.py", line 253, in run_and_get_output
    run_tesseract(**kwargs)
  File "D:\python\lib\site-packages\pytesseract\pytesseract.py", line 229, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
  File "D:\python\lib\site-packages\pytesseract\pytesseract.py", line 127, in get_errors
    line for line in error_string.decode('utf-8').splitlines()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 52: invalid start byte
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Photo image error pyc0de 2 172 Mar-23-2024, 06:20 PM
Last Post: pyc0de
  error "cannot identify image file" part way through running hatflyer 0 617 Nov-02-2023, 11:45 PM
Last Post: hatflyer
  Syntax error while executing the Python code in Linux DivAsh 8 1,454 Jul-19-2023, 06:27 PM
Last Post: Lahearle
  Error 1064 (42000) when executing UPDATE SQL gratiszzzz 7 1,368 May-22-2023, 02:38 PM
Last Post: buran
  pygame image load error Yegor123 1 1,487 Oct-12-2022, 05:36 AM
Last Post: deanhystad
  Help with CV2 pytesseract detcet numbers korenron 0 1,450 Apr-29-2021, 02:48 PM
Last Post: korenron
  help with pytesseract.image_to_string(savedImage, config='--psm 11')iamge to string korenron 0 2,649 Apr-29-2021, 10:08 AM
Last Post: korenron
  Facing error while executing below Python code ramu4651 1 5,633 Jan-26-2021, 06:40 PM
Last Post: ibreeden
  cx_Oracle.DatabaseError: Error while trying to retrieve text from error ORA-01804 rajeshparadker 0 8,598 Nov-12-2020, 07:34 PM
Last Post: rajeshparadker
  python-resize-image unicode decode error Pedroski55 3 3,403 Apr-21-2020, 10:56 AM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020