How to read from image with several languages using inage_to_string()?

EvilSnail · (This post was last modified: Nov-13-2021, 03:11 PM by EvilSnail.)

I have been experimenting with image_to_string function and I have a problem: I can't read from Image that contains text on several language. As you can see, it is supposed to understand both Russian and English, but it understands properly only the Russian language. A typical peace of English text looks like that "зггіп9_ігош_іі1е" when I use this program

from os import environ
from pdf2image import *
from pytesseract import image_to_string
from pytesseract import pytesseract
pytesseract.tesseract_cmd =  r'C:\Program Files\Tesseract-OCR\tesseract.exe'
environ["TESSDATA_RUS"]  =r"C:\Program Files\Tesseract-OCR\tessdata\tessdata\rus.trainedata"
images = convert_from_path("c.pdf",500,poppler_path=r"C:\Users\aleks\Downloads\poppler-0.68.0_x86\poppler-0.68.0\bin",first_page=260,last_page=300)
final_text = ""
for pg,img in enumerate(images):
    print(image_to_string(img,lang="rus+eng"))

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Read and view image.png ?	JohnnyCoffee	12	4,708	Mar-21-2021, 01:32 PM Last Post: JohnnyCoffee
	string.punctuation for languages like French or German	Leo978	1	2,333	Jun-06-2020, 09:09 AM Last Post: DeaD_EyE

How to read from image with several languages using inage_to_string()?

User Panel Messages

Announcements