reading pdfs in windows10 - Python 3.6

cobra · (This post was last modified: May-10-2018, 08:17 PM by buran.)

I am having some problem in windows 10 and Python 3.6 -

I am following the instruction from the link: https://medium.com/@rqaiserr/how-to-conv...aab86c544f

And all the libraries seems to be install properly.

When I run :

while (count < num_pages):
    pageObj = pdfReader.getPage(count)
    count +=1
    text += pageObj.extractText()

I get:

Error:runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction')
Traceback (most recent call last):

  File "<ipython-input-30-dd5ee387c87a>", line 1, in <module>
    runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction')

  File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py", line 39, in <module>
    text = textract.process('C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf', method='tesseract', language='eng')

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\__init__.py", line 77, in process
    return parser.process(filename, encoding, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 46, in process
    byte_string = self.extract(filename, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 33, in extract
    return self.extract_tesseract(filename, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 57, in extract_tesseract
    stdout, _ = self.run(['pdftoppm', filename, base])

  File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 91, in run
    ' '.join(args), 127, '', '',

ShellError: The command `pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv` failed with exit code 127
------------- stdout -------------
------------- stderr -------------

**nilamo** · May-10-2018, 09:40 PM

(May-10-2018, 07:01 PM)cobra Wrote: ShellError: The command pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv failed with exit code 127

Looks like your problem is whatever the pdftoppm program is, not with python.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Comparing PDFs	CaseCRS	5	1,207	Apr-01-2023, 05:46 AM Last Post: DPaul
	download pubmed PDFs using pubmed2pdf in python	Wooki	8	5,513	Oct-19-2020, 03:06 PM Last Post: jefsummers
	How to compare two PDFs for differences	Normanie	2	2,411	Jul-30-2020, 07:31 AM Last Post: millpond
	Concatenate multiple PDFs using python	gmehta1996	0	2,118	Mar-29-2020, 09:48 PM Last Post: gmehta1996
	autostart python scripts in background (Windows10)	john36	4	7,717	Oct-01-2019, 01:36 PM Last Post: john36
	Most optimized way to merge figures from multiple PDFs into one PDF page?	dmm809	1	2,069	May-22-2019, 10:32 PM Last Post: micseydel
	Merging pdfs with PyPDF2	Pedroski55	0	3,293	Mar-07-2019, 11:58 PM Last Post: Pedroski55
	How to Extract Specific Words from PDFs with Python	danvsv	1	4,525	Jan-17-2019, 11:07 AM Last Post: Larz60+
	How to parse pdfs in Python	CharType	2	4,039	Jan-09-2017, 11:56 PM Last Post: Blue Dog

reading pdfs in windows10 - Python 3.6

User Panel Messages

Announcements