reading pdfs in windows10 - Python 3.6

cobra · (This post was last modified: May-10-2018, 08:17 PM by buran.)

I am having some problem in windows 10 and Python 3.6 -

I am following the instruction from the link: https://medium.com/@rqaiserr/how-to-conv...aab86c544f

And all the libraries seems to be install properly.

When I run :

while (count < num_pages):
    pageObj = pdfReader.getPage(count)
    count +=1
    text += pageObj.extractText()

I get:

Error:runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction')
Traceback (most recent call last):

  File "<ipython-input-30-dd5ee387c87a>", line 1, in <module>
    runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction')

  File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py", line 39, in <module>
    text = textract.process('C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf', method='tesseract', language='eng')

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\__init__.py", line 77, in process
    return parser.process(filename, encoding, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 46, in process
    byte_string = self.extract(filename, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 33, in extract
    return self.extract_tesseract(filename, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 57, in extract_tesseract
    stdout, _ = self.run(['pdftoppm', filename, base])

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 91, in run
    ' '.join(args), 127, '', '',

ShellError: The command `pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv` failed with exit code 127
------------- stdout -------------
------------- stderr -------------

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Comparing PDFs	CaseCRS	5	1,236	Apr-01-2023, 05:46 AM Last Post: DPaul
	download pubmed PDFs using pubmed2pdf in python	Wooki	8	5,585	Oct-19-2020, 03:06 PM Last Post: jefsummers
	How to compare two PDFs for differences	Normanie	2	2,426	Jul-30-2020, 07:31 AM Last Post: millpond
	Concatenate multiple PDFs using python	gmehta1996	0	2,131	Mar-29-2020, 09:48 PM Last Post: gmehta1996
	autostart python scripts in background (Windows10)	john36	4	7,762	Oct-01-2019, 01:36 PM Last Post: john36
	Most optimized way to merge figures from multiple PDFs into one PDF page?	dmm809	1	2,090	May-22-2019, 10:32 PM Last Post: micseydel
	Merging pdfs with PyPDF2	Pedroski55	0	3,307	Mar-07-2019, 11:58 PM Last Post: Pedroski55
	How to Extract Specific Words from PDFs with Python	danvsv	1	4,538	Jan-17-2019, 11:07 AM Last Post: Larz60+
	How to parse pdfs in Python	CharType	2	4,055	Jan-09-2017, 11:56 PM Last Post: Blue Dog

reading pdfs in windows10 - Python 3.6

User Panel Messages

Announcements