Python Forum
PDF to CSV, can't get the module "miner text generator"
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PDF to CSV, can't get the module "miner text generator"
#1
Hey guys!

I hope you can help me with this little code I need. I want to export a pdf as a csv file. I found this code, but it can't seem to find a module normally installed within Python.

Here's the code :

import csv
import os
 
from miner_text_generator import extract_text_by_page
 
 
def export_as_csv(pdf_path, csv_path):
    filename = os.path.splitext(os.path.basename(pdf_path))[0]
 
    counter = 1
    with open(csv_path, 'w') as csv_file:
        writer = csv.writer(csv_file)
        for page in extract_text_by_page(pdf_path):
            text = page[0:100]
            words = text.split()
            writer.writerow(words)
 
 
if __name__ == '__main__':
    pdf_path = 'w9.pdf'
    csv_path = 'w9.csv'
    export_as_csv(pdf_path, csv_path)
w9.pdf is just an example document.

When I build it, I get this message :

Error:
Traceback (most recent call last): File "C:\Users\BMQT\Desktop\PDFtxt\pdf to csv.py", line 4, in <module> from miner_text_generator import extract_text_by_page ModuleNotFoundError: No module named 'miner_text_generator'
So I decided to install it with pip, but got this message :

Error:
C:\WINDOWS\system32>pip install miner_text_generator Collecting miner_text_generator Could not find a version that satisfies the requirement miner_text_generator (from versions: ) No matching distribution found for miner_text_generator
I'm using Python 3.7

Google is my friend I know, but I can't find anything about that module.

Can you guys help me please?
Reply
#2
You can use PDFMiner, see: https://www.blog.pythonlibrary.org/2018/...th-python/

or better, but longer learning curve:

You can use NLTK FreqDist package.
you can install nltk with pip, but need to install corpora as well, see:
Installation instructions here: https://www.nltk.org/install.html

You can use (last command is run from shell):
pip install nltk
pip install numpy
python -m nltk.downloader all
Reply
#3
The site you gave for PDFMiner is the one I use, and the code for exporting to csv is from this website, but it doesn't work.
Reply
#4
the install has to be (python 2.7):
python -m pip install pdfminer
for python 3:
python -m pip install pdfminer.six
not
pip install miner_text_generator
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020