Several pdf files to text

**deanhystad** · (This post was last modified: Jul-06-2021, 05:33 PM by deanhystad.)

Do you have code that works for 1 file? I would convert that to a function and call the function for each pdf file. The process of calling a function to process a file will be trivial.

I have not tested this code at all, but provide it as an example of how I would split up this task.

import sys
import os
from pytesseract import pytesseract
from pdf2image import convert_from_path

POPPLER = r'C:\Program Files\poppler-0.68.0_x86\poppler-0.68.0\bin'
pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

def pdf_to_text(pdf_file, text_file, lang='fra'):
    """Use OCR to convert PDF file to text."""
    images = convert_from_path(pdf_file, poppler_path=POPPLER)
    with open(text_file, 'w') as out_file:
        for image in images:
            text = pytesseract.image_to_string(image, lang =lang)
            out_file.write(text.replace("\n", " "))

def pdf_files_to_text(folder, out_folder=None):
    """Using OCR convert all PDF files in folder to text files.  Text
    files are saved in out_folder (defaults to folder) using same name
    with extension changed to .txt
    """
    if out_folder is None:
        out_folder = folder
    for file in os.listdir(folder):
        if file.endswith('.png'):
            pdf_to_text(folder+file, out_folder+file[:-3]+'txt')

if __name__ == '__main__':
    pdf_files_to_text(os.getcwd())

First I solve the real problem: converting a PDF file to text. This is a useful task, so I write a function that does this task.

Next I write a function that will convert all the pdf files in a folder into text files. This too is a useful task, so I write it as a function that can be called by other code.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	azure TTS from text files to mp3s	mutantGOD	2	1,809	Jan-17-2023, 03:20 AM Last Post: mutantGOD
	Writing into 2 text files from the same function	paul18fr	4	1,770	Jul-28-2022, 04:34 AM Last Post: ndc85430
	Delete empty text files [SOLVED]	AlphaInc	5	1,654	Jul-09-2022, 02:15 PM Last Post: DeaD_EyE
	select files such as text file	RolanRoll	2	1,257	Jun-25-2022, 08:07 PM Last Post: RolanRoll
	Two text files, want to add a column value	zxcv101	8	2,063	Jun-20-2022, 03:06 PM Last Post: deanhystad
	select Eof extension files based on text list of filenames with if condition	RolanRoll	1	1,588	Apr-04-2022, 09:29 PM Last Post: Larz60+
	Separate text files and convert into csv	marfer	6	3,005	Dec-10-2021, 12:09 PM Last Post: marfer
	Sorting and Merging text-files [SOLVED]	AlphaInc	10	5,084	Aug-20-2021, 05:42 PM Last Post: snippsat
	Replace String in multiple text-files [SOLVED]	AlphaInc	5	8,333	Aug-08-2021, 04:59 PM Last Post: Axel_Erfurt
	Open and read multiple text files and match words	kozaizsvemira	3	6,858	Jul-07-2021, 11:27 AM Last Post: Larz60+

Several pdf files to text

User Panel Messages

Announcements