Python Forum
How to read from image with several languages using inage_to_string()?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to read from image with several languages using inage_to_string()?
#1
I have been experimenting with image_to_string function and I have a problem: I can't read from Image that contains text on several language. As you can see, it is supposed to understand both Russian and English, but it understands properly only the Russian language. A typical peace of English text looks like that "зггіп9_ігош_іі1е" when I use this program
from os import environ
from pdf2image import *
from pytesseract import image_to_string
from pytesseract import pytesseract
pytesseract.tesseract_cmd =  r'C:\Program Files\Tesseract-OCR\tesseract.exe'
environ["TESSDATA_RUS"]  =r"C:\Program Files\Tesseract-OCR\tessdata\tessdata\rus.trainedata"
images = convert_from_path("c.pdf",500,poppler_path=r"C:\Users\aleks\Downloads\poppler-0.68.0_x86\poppler-0.68.0\bin",first_page=260,last_page=300)
final_text = ""
for pg,img in enumerate(images):
    print(image_to_string(img,lang="rus+eng"))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Read and view image.png ? JohnnyCoffee 12 4,708 Mar-21-2021, 01:32 PM
Last Post: JohnnyCoffee
  string.punctuation for languages like French or German Leo978 1 2,331 Jun-06-2020, 09:09 AM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020