Python Forum
Convert text from an image to a text file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Convert text from an image to a text file (/thread-20020.html)



Convert text from an image to a text file - Evil_Patrick - Jul-24-2019

I want to convert this scanned image into a text file but I have no Idea Undecided

[Image: gcq2673.jpg]



RE: Convert text from an image to a text file - DeaD_EyE - Jul-24-2019

You can do this with Pytesseract.

First you need to install Tesseract from google.
Then you install the wrapper with pip:
pip install pytesseract Pillow requests
After this it should be ready for use:

import io
import requests
import pytesseract
from PIL import Image


URL = 'https://i.imgur.com/gcq2673.jpg'
raw_data = io.BytesIO(requests.get(URL).content)
img = Image.open(raw_data)
text = pytesseract.image_to_string(img)

print(text)
Result:
The benefit is, that this library works offline. No connection to the internet is needed to convert local images into text.
Tesseract is still developed by google.


RE: Convert text from an image to a text file - Evil_Patrick - Jul-24-2019

(Jul-24-2019, 10:32 AM)DeaD_EyE Wrote: You can do this with Pytesseract.

First you need to install Tesseract from google.
Then you install the wrapper with pip:
pip install pytesseract Pillow requests
After this it should be ready for use:

import io
import requests
import pytesseract
from PIL import Image


URL = 'https://i.imgur.com/gcq2673.jpg'
raw_data = io.BytesIO(requests.get(URL).content)
img = Image.open(raw_data)
text = pytesseract.image_to_string(img)

print(text)
Result:
The benefit is, that this library works offline. No connection to the internet is needed to convert local images into text.
Tesseract is still developed by google.

what about the lang isn't it too hard to identify

Error:
Traceback (most recent call last): File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 184, in run_tesseract proc = subprocess.Popen(cmd_args, **subprocess_args()) File "C:\Python 3\lib\subprocess.py", line 775, in __init__ restore_signals, start_new_session) File "C:\Python 3\lib\subprocess.py", line 1178, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified During handling of the above exception, another exception occurred: Traceback (most recent call last): File "Python.py", line 10, in <module> text = pytesseract.image_to_string(img) File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 309, in image_to_string }[output_type]() File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 308, in <lambda> Output.STRING: lambda: run_and_get_output(*args), File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 218, in run_and_get_output run_tesseract(**kwargs) File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 186, in run_tesseract raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path



RE: Convert text from an image to a text file - DeaD_EyE - Jul-24-2019

A better error description is not possible.
Error:
tesseract is not installed or it's not in your path
The tesseract dependency from google is not in the Path.
Maybe you haven't installed tesseract, just only the wrapper PyTesseract
or there was an error during installation.


RE: Convert text from an image to a text file - Evil_Patrick - Jul-25-2019

(Jul-24-2019, 01:06 PM)DeaD_EyE Wrote: A better error description is not possible.
Error:
tesseract is not installed or it's not in your path
The tesseract dependency from google is not in the Path.
Maybe you haven't installed tesseract, just only the wrapper PyTesseract
or there was an error during installation.

Can we chat in messages about it


RE: Convert text from an image to a text file - DeaD_EyE - Jul-30-2019

On Ubuntu 18.x:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
On Windows:
This may work: https://github.com/UB-Mannheim/tesseract/wiki

PyTesseract is just a nice Python-Wrapper around Tesseract. Without Tesseract, no PyTesseract.