Python Forum
reading pdfs in windows10 - Python 3.6
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
reading pdfs in windows10 - Python 3.6
#1
I am having some problem in windows 10 and Python 3.6 -

I am following the instruction from the link: https://medium.com/@rqaiserr/how-to-conv...aab86c544f

And all the libraries seems to be install properly.

When I run :

while (count < num_pages):
    pageObj = pdfReader.getPage(count)
    count +=1
    text += pageObj.extractText()
I get:

Error:
runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction') Traceback (most recent call last): File "<ipython-input-30-dd5ee387c87a>", line 1, in <module> runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction') File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace) File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py", line 39, in <module> text = textract.process('C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf', method='tesseract', language='eng') File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\__init__.py", line 77, in process return parser.process(filename, encoding, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 46, in process byte_string = self.extract(filename, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 33, in extract return self.extract_tesseract(filename, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 57, in extract_tesseract stdout, _ = self.run(['pdftoppm', filename, base]) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 91, in run ' '.join(args), 127, '', '', ShellError: The command `pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv` failed with exit code 127 ------------- stdout ------------- ------------- stderr -------------
Reply
#2
(May-10-2018, 07:01 PM)cobra Wrote: ShellError: The command pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv failed with exit code 127

Looks like your problem is whatever the pdftoppm program is, not with python.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Comparing PDFs CaseCRS 5 1,207 Apr-01-2023, 05:46 AM
Last Post: DPaul
  download pubmed PDFs using pubmed2pdf in python Wooki 8 5,513 Oct-19-2020, 03:06 PM
Last Post: jefsummers
  How to compare two PDFs for differences Normanie 2 2,411 Jul-30-2020, 07:31 AM
Last Post: millpond
  Concatenate multiple PDFs using python gmehta1996 0 2,118 Mar-29-2020, 09:48 PM
Last Post: gmehta1996
  autostart python scripts in background (Windows10) john36 4 7,717 Oct-01-2019, 01:36 PM
Last Post: john36
  Most optimized way to merge figures from multiple PDFs into one PDF page? dmm809 1 2,069 May-22-2019, 10:32 PM
Last Post: micseydel
  Merging pdfs with PyPDF2 Pedroski55 0 3,293 Mar-07-2019, 11:58 PM
Last Post: Pedroski55
Photo How to Extract Specific Words from PDFs with Python danvsv 1 4,525 Jan-17-2019, 11:07 AM
Last Post: Larz60+
  How to parse pdfs in Python CharType 2 4,039 Jan-09-2017, 11:56 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020