Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pdf lookalikes
#1
Hi,
Some service-minded villages have made their 1000s of prayer cards available to the public as pdfs.
Or so it seems, because the thing's extension is *.pdf.
But when you open it acrobat says : "This document CLAIMS TO BE a pdf/a file...", waw... they must have used some
alien tool to generate it.

Let's try python: pytesseract, pyplumber, pdfminer, pypdf2.... all open the document but see no text.

I cannot select text with the cursor, but the acrobat cursor is a crosshair, i can select an area, and now i can save that
manually as an image. But not 10.000 times.

So any ideas on yet another module to open alien pdfs ?
thx,
Paul

Edit: don't worry, Python is there for you. I found a 3 trier solution that eventually yields the text using pymupdf. A litthe cumbersome, but it works.
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020