I have been experimenting with image_to_string function and I have a problem: I can't read from Image that contains text on several language. As you can see, it is supposed to understand both Russian and English, but it understands properly only the Russian language. A typical peace of English text looks like that "зггіп9_ігош_іі1е" when I use this program
from os import environ from pdf2image import * from pytesseract import image_to_string from pytesseract import pytesseract pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' environ["TESSDATA_RUS"] =r"C:\Program Files\Tesseract-OCR\tessdata\tessdata\rus.trainedata" images = convert_from_path("c.pdf",500,poppler_path=r"C:\Users\aleks\Downloads\poppler-0.68.0_x86\poppler-0.68.0\bin",first_page=260,last_page=300) final_text = "" for pg,img in enumerate(images): print(image_to_string(img,lang="rus+eng"))