Python Forum
PDFminer outputs unreadable text during conversion from PDF to TXT
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PDFminer outputs unreadable text during conversion from PDF to TXT
#4
I searched a lot, but I can't find an answer.

The PDF has 2 embedded fonts. They are both types of Times Roman, which is a common font. The PDF was made on MacOS.

Maybe if you Python it on an apple computer, you will get the correct output.

I tried saving as binary, then opening but that did not work:

with pymupdf.open(path2pdf) as doc:  # open document
    text = chr(12).join([page.get_text() for page in doc])
    # write as a binary file to support non-ASCII characters
    pathlib.Path(path2pdf + ".txt").write_bytes(text.encode())

with open(path2text, encoding="utf-8") as f:
    text = f.read()
Like I said, the PDF displays correctly, so the information must be in there! How to extract it?? Confused Confused Confused
Reply


Messages In This Thread
RE: PDFminer outputs unreadable text during conversion from PDF to TXT - by Pedroski55 - Aug-05-2024, 08:47 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Text conversion to lowercase is not working ineuw 3 1,311 Jan-16-2024, 02:42 AM
Last Post: ineuw
  format json outputs ! evilcode1 3 2,626 Oct-29-2023, 01:30 PM
Last Post: omemoe277
  Formatting outputs created with .join command klairel 2 1,512 Aug-23-2023, 08:52 AM
Last Post: perfringo
  How to properly scale text in postscript conversion to pdf? philipbergwerf 3 2,122 Nov-07-2022, 01:30 PM
Last Post: philipbergwerf
  pdfminer package: module isn't found Pavel_47 25 17,029 Sep-18-2022, 08:40 PM
Last Post: Larz60+
  I have written a program that outputs data based on GPS signal kalle 1 2,035 Jul-22-2022, 12:10 AM
Last Post: mcmxl22
  Why does absence of print command outputs quotes in function? Mark17 2 2,063 Jan-04-2022, 07:08 PM
Last Post: ndc85430
  Thoughts on interfacing with a QR code reader that outputs keystrokes? wrybread 1 2,077 Oct-08-2021, 03:44 PM
Last Post: bowlofred
  pdfminer to csv mfernandes 2 3,668 Jun-16-2021, 10:54 AM
Last Post: mfernandes
  Combining outputs into a dataframe rybina 0 2,098 Mar-15-2021, 02:43 PM
Last Post: rybina

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020