Python Forum
Convert text from an image to a text file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Convert text from an image to a text file
#1
I want to convert this scanned image into a text file but I have no Idea Undecided

[Image: gcq2673.jpg]
Reply
#2
You can do this with Pytesseract.

First you need to install Tesseract from google.
Then you install the wrapper with pip:
pip install pytesseract Pillow requests
After this it should be ready for use:

import io
import requests
import pytesseract
from PIL import Image


URL = 'https://i.imgur.com/gcq2673.jpg'
raw_data = io.BytesIO(requests.get(URL).content)
img = Image.open(raw_data)
text = pytesseract.image_to_string(img)

print(text)
Result:
The benefit is, that this library works offline. No connection to the internet is needed to convert local images into text.
Tesseract is still developed by google.
RockBlok likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
(Jul-24-2019, 10:32 AM)DeaD_EyE Wrote: You can do this with Pytesseract.

First you need to install Tesseract from google.
Then you install the wrapper with pip:
pip install pytesseract Pillow requests
After this it should be ready for use:

import io
import requests
import pytesseract
from PIL import Image


URL = 'https://i.imgur.com/gcq2673.jpg'
raw_data = io.BytesIO(requests.get(URL).content)
img = Image.open(raw_data)
text = pytesseract.image_to_string(img)

print(text)
Result:
The benefit is, that this library works offline. No connection to the internet is needed to convert local images into text.
Tesseract is still developed by google.

what about the lang isn't it too hard to identify

Error:
Traceback (most recent call last): File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 184, in run_tesseract proc = subprocess.Popen(cmd_args, **subprocess_args()) File "C:\Python 3\lib\subprocess.py", line 775, in __init__ restore_signals, start_new_session) File "C:\Python 3\lib\subprocess.py", line 1178, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified During handling of the above exception, another exception occurred: Traceback (most recent call last): File "Python.py", line 10, in <module> text = pytesseract.image_to_string(img) File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 309, in image_to_string }[output_type]() File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 308, in <lambda> Output.STRING: lambda: run_and_get_output(*args), File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 218, in run_and_get_output run_tesseract(**kwargs) File "C:\Python 3\lib\site-packages\pytesseract\pytesseract.py", line 186, in run_tesseract raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
Reply
#4
A better error description is not possible.
Error:
tesseract is not installed or it's not in your path
The tesseract dependency from google is not in the Path.
Maybe you haven't installed tesseract, just only the wrapper PyTesseract
or there was an error during installation.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
(Jul-24-2019, 01:06 PM)DeaD_EyE Wrote: A better error description is not possible.
Error:
tesseract is not installed or it's not in your path
The tesseract dependency from google is not in the Path.
Maybe you haven't installed tesseract, just only the wrapper PyTesseract
or there was an error during installation.

Can we chat in messages about it
Reply
#6
On Ubuntu 18.x:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
On Windows:
This may work: https://github.com/UB-Mannheim/tesseract/wiki

PyTesseract is just a nice Python-Wrapper around Tesseract. Without Tesseract, no PyTesseract.
RockBlok likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Exclamation URGENT: How to plot data from text file. Trying to recreate plots from MATLAB JamieAl 4 3,533 Dec-03-2023, 06:56 AM
Last Post: Pedroski55
  dictionary output to text file (beginner) Delg_Dankil 2 1,163 Jul-12-2023, 11:45 AM
Last Post: deanhystad
  How to convert .trc/.txt file into excel using python ebola 3 1,978 Jan-15-2023, 10:37 PM
Last Post: Yoriz
  beginner having text based adventure trouble mrgee 2 2,060 Dec-16-2021, 05:07 AM
Last Post: buran
  Trouble downloading and using any text editors edwarmax001 1 1,838 Feb-20-2021, 05:36 PM
Last Post: Larz60+
  Split string into 160-character chunks while adding text to each part iambobbiekings 9 9,564 Jan-27-2021, 08:15 AM
Last Post: iambobbiekings
  HomeWork Python - Drawing window with text center. Voraman 8 3,260 Jan-09-2021, 06:53 PM
Last Post: Voraman
  Reading a text until matched string and print it as a single line cananb 1 2,017 Nov-29-2020, 01:38 PM
Last Post: DPaul
  computer science coursework, read the text please and tell me if theres any specifics sixcray 4 2,604 Nov-11-2020, 03:17 PM
Last Post: buran
  Convert all actions through functions, fill the dictionary from a file Astone 3 2,413 Oct-26-2020, 09:11 AM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020