Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Several pdf files to text
#1
Dear Python community,
I have several pdf files in a folder and I would like to convert all of them into text file. In this link it is explained how to prepare the code for one pdf file: https://www.geeksforgeeks.org/python-rea...cognition/.
Before coding, it was necessary to install tesseract (https://pypi.org/project/pytesseract/) and poppler (https://poppler.freedesktop.org/).
I am trying to prepare my code for several pdf files:
 from PIL import Image
import pytesseract
import sys
from pdf2image import convert_from_path
import os
import string

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

def main():
    # path for the folder for getting the pdfs
    path="C:/Users/mydirectory"
    # path for the folder for getting the output
    tempPath ="C:/Users/mydirectory (2)"
    
    for imageName in os.listdir(path):
        pages = convert_from_path(path, poppler_path=r'C:\Program Files\poppler-0.68.0_x86\poppler-0.68.0\bin')
        image_counter = 1
        for page in pages:
            filename = "page_"+str(os.image_counter)+".png"
            page.save(filename, 'PNG')
            image_counter = image_counter + 1
        filelimit = image_counter-1
        for i in range(1, filelimit+1):
            filename="page_"+str(i)+".png"
        inputPath=os.path.join(path, imageName)
        text = pt.image_to_string(Image.open(filename), lang ="fra")        
        text = text.replace("\n", " ")
        fullTempPath = os.path.join(tempPath, 'time_'+imageName+".txt")
        file1 = open(fullTempPath, "w")
        file1.write(text)
        file1.close() 
  
if __name__ == '__main__':
    main() 
However, I am obtaining the following message "PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'C:\Users\mydirectory': No error."
Thank you!
Reply


Messages In This Thread
Several pdf files to text - by mfernandes - Jul-05-2021, 08:56 PM
RE: Several pdf files to text - by mfernandes - Jul-05-2021, 09:02 PM
RE: Several pdf files to text - by Pedroski55 - Jul-06-2021, 02:54 AM
RE: Several pdf files to text - by mfernandes - Jul-06-2021, 11:10 AM
RE: Several pdf files to text - by deanhystad - Jul-06-2021, 05:07 PM
RE: Several pdf files to text - by mfernandes - Jul-06-2021, 06:38 PM
RE: Several pdf files to text - by deanhystad - Jul-06-2021, 07:25 PM
RE: Several pdf files to text - by Pedroski55 - Jul-06-2021, 11:42 PM
RE: Several pdf files to text - by mfernandes - Jul-07-2021, 08:14 PM
RE: Several pdf files to text - by deanhystad - Jul-07-2021, 09:25 PM
RE: Several pdf files to text - by Pedroski55 - Jul-07-2021, 11:39 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  azure TTS from text files to mp3s mutantGOD 2 1,747 Jan-17-2023, 03:20 AM
Last Post: mutantGOD
  Writing into 2 text files from the same function paul18fr 4 1,716 Jul-28-2022, 04:34 AM
Last Post: ndc85430
  Delete empty text files [SOLVED] AlphaInc 5 1,623 Jul-09-2022, 02:15 PM
Last Post: DeaD_EyE
  select files such as text file RolanRoll 2 1,202 Jun-25-2022, 08:07 PM
Last Post: RolanRoll
  Two text files, want to add a column value zxcv101 8 1,982 Jun-20-2022, 03:06 PM
Last Post: deanhystad
  select Eof extension files based on text list of filenames with if condition RolanRoll 1 1,556 Apr-04-2022, 09:29 PM
Last Post: Larz60+
  Separate text files and convert into csv marfer 6 2,938 Dec-10-2021, 12:09 PM
Last Post: marfer
  Sorting and Merging text-files [SOLVED] AlphaInc 10 4,982 Aug-20-2021, 05:42 PM
Last Post: snippsat
  Replace String in multiple text-files [SOLVED] AlphaInc 5 8,237 Aug-08-2021, 04:59 PM
Last Post: Axel_Erfurt
  Open and read multiple text files and match words kozaizsvemira 3 6,802 Jul-07-2021, 11:27 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020