Python Forum
reading pdfs in windows10 - Python 3.6
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
reading pdfs in windows10 - Python 3.6
#1
I am having some problem in windows 10 and Python 3.6 -

I am following the instruction from the link: https://medium.com/@rqaiserr/how-to-conv...aab86c544f

And all the libraries seems to be install properly.

When I run :

while (count < num_pages):
    pageObj = pdfReader.getPage(count)
    count +=1
    text += pageObj.extractText()
I get:

Error:
runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction') Traceback (most recent call last): File "<ipython-input-30-dd5ee387c87a>", line 1, in <module> runfile('C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py', wdir='C:/Users/User/Desktop/Pdf Extraction') File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace) File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/User/Desktop/Pdf Extraction/PDF-Python.py", line 39, in <module> text = textract.process('C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf', method='tesseract', language='eng') File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\__init__.py", line 77, in process return parser.process(filename, encoding, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 46, in process byte_string = self.extract(filename, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 33, in extract return self.extract_tesseract(filename, **kwargs) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\pdf_parser.py", line 57, in extract_tesseract stdout, _ = self.run(['pdftoppm', filename, base]) File "C:\Users\User\Anaconda3\lib\site-packages\textract\parsers\utils.py", line 91, in run ' '.join(args), 127, '', '', ShellError: The command `pdftoppm C:/Users/User/Desktop/Pdf Extraction/CF000048186.pdf C:\Users\User\AppData\Local\Temp\tmp3ed3_ems\conv` failed with exit code 127 ------------- stdout ------------- ------------- stderr -------------
Reply


Messages In This Thread
reading pdfs in windows10 - Python 3.6 - by cobra - May-10-2018, 07:01 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Comparing PDFs CaseCRS 5 1,236 Apr-01-2023, 05:46 AM
Last Post: DPaul
  download pubmed PDFs using pubmed2pdf in python Wooki 8 5,585 Oct-19-2020, 03:06 PM
Last Post: jefsummers
  How to compare two PDFs for differences Normanie 2 2,426 Jul-30-2020, 07:31 AM
Last Post: millpond
  Concatenate multiple PDFs using python gmehta1996 0 2,131 Mar-29-2020, 09:48 PM
Last Post: gmehta1996
  autostart python scripts in background (Windows10) john36 4 7,762 Oct-01-2019, 01:36 PM
Last Post: john36
  Most optimized way to merge figures from multiple PDFs into one PDF page? dmm809 1 2,090 May-22-2019, 10:32 PM
Last Post: micseydel
  Merging pdfs with PyPDF2 Pedroski55 0 3,307 Mar-07-2019, 11:58 PM
Last Post: Pedroski55
Photo How to Extract Specific Words from PDFs with Python danvsv 1 4,538 Jan-17-2019, 11:07 AM
Last Post: Larz60+
  How to parse pdfs in Python CharType 2 4,055 Jan-09-2017, 11:56 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020