Python Forum

Full Version: reading pdfs in windows10 - Python 3.6
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am having some problem in windows 10 and Python 3.6 -

I am following the instruction from the link: https://medium.com/@rqaiserr/how-to-conv...aab86c544f

And all the libraries seems to be install properly.

When I run :

while (count < num_pages):
    pageObj = pdfReader.getPage(count)
    count +=1
    text += pageObj.extractText()
I get:

Error:
runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction') Traceback (most recent call last): File "<ipython-input-30-dd5ee387c87a>", line 1, in <module> runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction') File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace) File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py", line 39, in <module> text = textract.process('C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf', method='tesseract', language='eng') File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\__init__.py", line 77, in process return parser.process(filename, encoding, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 46, in process byte_string = self.extract(filename, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 33, in extract return self.extract_tesseract(filename, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 57, in extract_tesseract stdout, _ = self.run(['pdftoppm', filename, base]) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 91, in run ' '.join(args), 127, '', '', ShellError: The command `pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv` failed with exit code 127 ------------- stdout ------------- ------------- stderr -------------
(May-10-2018, 07:01 PM)cobra Wrote: [ -> ]ShellError: The command pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv failed with exit code 127

Looks like your problem is whatever the pdftoppm program is, not with python.