Python Forum

Full Version: How to read from image with several languages using inage_to_string()?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have been experimenting with image_to_string function and I have a problem: I can't read from Image that contains text on several language. As you can see, it is supposed to understand both Russian and English, but it understands properly only the Russian language. A typical peace of English text looks like that "зггіп9_ігош_іі1е" when I use this program
from os import environ
from pdf2image import *
from pytesseract import image_to_string
from pytesseract import pytesseract
pytesseract.tesseract_cmd =  r'C:\Program Files\Tesseract-OCR\tesseract.exe'
environ["TESSDATA_RUS"]  =r"C:\Program Files\Tesseract-OCR\tessdata\tessdata\rus.trainedata"
images = convert_from_path("c.pdf",500,poppler_path=r"C:\Users\aleks\Downloads\poppler-0.68.0_x86\poppler-0.68.0\bin",first_page=260,last_page=300)
final_text = ""
for pg,img in enumerate(images):
    print(image_to_string(img,lang="rus+eng"))