Python Forum

Full Version: PDF to CSV, can't get the module "miner text generator"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey guys!

I hope you can help me with this little code I need. I want to export a pdf as a csv file. I found this code, but it can't seem to find a module normally installed within Python.

Here's the code :

import csv
import os
from miner_text_generator import extract_text_by_page
def export_as_csv(pdf_path, csv_path):
    filename = os.path.splitext(os.path.basename(pdf_path))[0]
    counter = 1
    with open(csv_path, 'w') as csv_file:
        writer = csv.writer(csv_file)
        for page in extract_text_by_page(pdf_path):
            text = page[0:100]
            words = text.split()
if __name__ == '__main__':
    pdf_path = 'w9.pdf'
    csv_path = 'w9.csv'
    export_as_csv(pdf_path, csv_path)
w9.pdf is just an example document.

When I build it, I get this message :

Traceback (most recent call last): File "C:\Users\BMQT\Desktop\PDFtxt\pdf to", line 4, in <module> from miner_text_generator import extract_text_by_page ModuleNotFoundError: No module named 'miner_text_generator'
So I decided to install it with pip, but got this message :

C:\WINDOWS\system32>pip install miner_text_generator Collecting miner_text_generator Could not find a version that satisfies the requirement miner_text_generator (from versions: ) No matching distribution found for miner_text_generator
I'm using Python 3.7

Google is my friend I know, but I can't find anything about that module.

Can you guys help me please?
You can use PDFMiner, see:

or better, but longer learning curve:

You can use NLTK FreqDist package.
you can install nltk with pip, but need to install corpora as well, see:
Installation instructions here:

You can use (last command is run from shell):
pip install nltk
pip install numpy
python -m nltk.downloader all
The site you gave for PDFMiner is the one I use, and the code for exporting to csv is from this website, but it doesn't work.
the install has to be (python 2.7):
python -m pip install pdfminer
for python 3:
python -m pip install pdfminer.six
pip install miner_text_generator