Jul-05-2021, 08:56 PM
Dear Python community,
I have several pdf files in a folder and I would like to convert all of them into text file. In this link it is explained how to prepare the code for one pdf file: https://www.geeksforgeeks.org/python-rea...cognition/.
Before coding, it was necessary to install tesseract (https://pypi.org/project/pytesseract/) and poppler (https://poppler.freedesktop.org/).
I am trying to prepare my code for several pdf files:
I/O Error: Couldn't open file 'C:\Users\mydirectory': No error."
Thank you!
I have several pdf files in a folder and I would like to convert all of them into text file. In this link it is explained how to prepare the code for one pdf file: https://www.geeksforgeeks.org/python-rea...cognition/.
Before coding, it was necessary to install tesseract (https://pypi.org/project/pytesseract/) and poppler (https://poppler.freedesktop.org/).
I am trying to prepare my code for several pdf files:
from PIL import Image import pytesseract import sys from pdf2image import convert_from_path import os import string pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' def main(): # path for the folder for getting the pdfs path="C:/Users/mydirectory" # path for the folder for getting the output tempPath ="C:/Users/mydirectory (2)" for imageName in os.listdir(path): pages = convert_from_path(path, poppler_path=r'C:\Program Files\poppler-0.68.0_x86\poppler-0.68.0\bin') image_counter = 1 for page in pages: filename = "page_"+str(os.image_counter)+".png" page.save(filename, 'PNG') image_counter = image_counter + 1 filelimit = image_counter-1 for i in range(1, filelimit+1): filename="page_"+str(i)+".png" inputPath=os.path.join(path, imageName) text = pt.image_to_string(Image.open(filename), lang ="fra") text = text.replace("\n", " ") fullTempPath = os.path.join(tempPath, 'time_'+imageName+".txt") file1 = open(fullTempPath, "w") file1.write(text) file1.close() if __name__ == '__main__': main()However, I am obtaining the following message "PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'C:\Users\mydirectory': No error."
Thank you!