Jun-11-2022, 02:21 AM
(This post was last modified: Jun-11-2022, 02:37 AM by Larz60+.
Edit Reason: fixed error tags
)
I need to run some tests on converting PDF's to images. The article at https://medium.com/towards-data-science/...670ee38052 has been quite helpful.
I have pdf2image, poppler, poppler-utils, etc installed. I have used the code at https://gist.github.com/akash-ch2812/1e2...o_Image.py
I have looked through the issues from "pdf2image" and problem not solved.
Does this script simply need the "poppler" path and how do I find that ? Also, surely the code can be modified so that a parameters is parsed to specify path/directories. The PDF is in the same path as the script, so I assume the script is failing because it doesn't know where "poppler" is found.
I have pdf2image, poppler, poppler-utils, etc installed. I have used the code at https://gist.github.com/akash-ch2812/1e2...o_Image.py
from pdf2image import convert_from_path pdfs = r"provide path to pdf file" pages = convert_from_path(pdfs, 350) i = 1 for page in pages: image_name = "Page_" + str(i) + ".jpg" page.save(image_name, "JPEG") i = i+1
Error:$ python3 PDF_to_Image.py Tests_20220530.pdf
Traceback (most recent call last):
File "/home/********/.local/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 479, in pdfinfo_from_path
raise ValueError
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/********/Downloads/OCR/PDF_to_Image.py", line 4, in <module>
pages = convert_from_path(pdfs, 350)
File "/home/********/.local/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 98, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "/home/********/.local/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 488, in pdfinfo_from_path
raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'provide path to pdf file': No such file or directory.
So, have tried various versions of specifying the path and/or filename at line 3, all to no avail. Not sure how to determine if 'poppler' is in the PATH, however a 'locate' shows it is installed.I have looked through the issues from "pdf2image" and problem not solved.
Does this script simply need the "poppler" path and how do I find that ? Also, surely the code can be modified so that a parameters is parsed to specify path/directories. The PDF is in the same path as the script, so I assume the script is failing because it doesn't know where "poppler" is found.