Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pdfminer vs pdfplumber
#2
Most use pdfminer.six now.

I'm not familiar with pdfplumber, but it looks interesting. Let us know your experience with it.

Please keep in mind that a pdf file is a very complicated object, and can take many forms
for example contents can be any combination of
  • images
  • pure text
  • tables
  • text as images (which can only be extracted using some form of OCR)
And I probably missed some.

The documents for pdfminer.six show some rather simple methods: https://pdfminersix.readthedocs.io/en/la...level.html
pprod likes this post
Reply


Messages In This Thread
pdfminer vs pdfplumber - by pprod - Jan-30-2021, 09:39 AM
RE: pdfminer vs pdfplumber - by Larz60+ - Jan-30-2021, 12:17 PM
RE: pdfminer vs pdfplumber - by pprod - Jan-30-2021, 01:35 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting Data into Columns using pdfplumber arvin 17 5,993 Dec-17-2022, 11:59 AM
Last Post: arvin
  pdfminer package: module isn't found Pavel_47 25 9,704 Sep-18-2022, 08:40 PM
Last Post: Larz60+
  pdfminer to csv mfernandes 2 2,893 Jun-16-2021, 10:54 AM
Last Post: mfernandes
  PDFplumber pprod 2 5,150 Jan-26-2021, 06:12 PM
Last Post: pprod
  pdfminer.six: search for complete documentation Pavel_47 3 2,872 Jan-25-2021, 04:41 PM
Last Post: buran
  pdfminer package: can't find exgtract_text function Pavel_47 7 5,428 Jan-25-2021, 03:31 PM
Last Post: Pavel_47
  PDFplumber pprod 2 2,445 Nov-10-2020, 02:37 PM
Last Post: pprod
  PDFplumber pprod 2 2,122 Nov-06-2020, 08:34 AM
Last Post: pprod
  install pdfminer tkj80 2 11,588 Jan-12-2018, 12:39 AM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020