Hi,
Some service-minded villages have made their 1000s of prayer cards available to the public as pdfs.
Or so it seems, because the thing's extension is *.pdf.
But when you open it acrobat says : "This document CLAIMS TO BE a pdf/a file...", waw... they must have used some
alien tool to generate it.
Let's try python: pytesseract, pyplumber, pdfminer, pypdf2.... all open the document but see no text.
I cannot select text with the cursor, but the acrobat cursor is a crosshair, i can select an area, and now i can save that
manually as an image. But not 10.000 times.
So any ideas on yet another module to open alien pdfs ?
thx,
Paul
Edit: don't worry, Python is there for you. I found a 3 trier solution that eventually yields the text using pymupdf. A litthe cumbersome, but it works.
Some service-minded villages have made their 1000s of prayer cards available to the public as pdfs.
Or so it seems, because the thing's extension is *.pdf.
But when you open it acrobat says : "This document CLAIMS TO BE a pdf/a file...", waw... they must have used some
alien tool to generate it.
Let's try python: pytesseract, pyplumber, pdfminer, pypdf2.... all open the document but see no text.
I cannot select text with the cursor, but the acrobat cursor is a crosshair, i can select an area, and now i can save that
manually as an image. But not 10.000 times.
So any ideas on yet another module to open alien pdfs ?
thx,
Paul
Edit: don't worry, Python is there for you. I found a 3 trier solution that eventually yields the text using pymupdf. A litthe cumbersome, but it works.
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.