Jan-30-2021, 12:17 PM
Most use pdfminer.six now.
I'm not familiar with pdfplumber, but it looks interesting. Let us know your experience with it.
Please keep in mind that a pdf file is a very complicated object, and can take many forms
for example contents can be any combination of
The documents for pdfminer.six show some rather simple methods: https://pdfminersix.readthedocs.io/en/la...level.html
I'm not familiar with pdfplumber, but it looks interesting. Let us know your experience with it.
Please keep in mind that a pdf file is a very complicated object, and can take many forms
for example contents can be any combination of
- images
- pure text
- tables
- text as images (which can only be extracted using some form of OCR)
The documents for pdfminer.six show some rather simple methods: https://pdfminersix.readthedocs.io/en/la...level.html