Aug-05-2024, 08:47 AM
I searched a lot, but I can't find an answer.
The PDF has 2 embedded fonts. They are both types of Times Roman, which is a common font. The PDF was made on MacOS.
Maybe if you Python it on an apple computer, you will get the correct output.
I tried saving as binary, then opening but that did not work:
The PDF has 2 embedded fonts. They are both types of Times Roman, which is a common font. The PDF was made on MacOS.
Maybe if you Python it on an apple computer, you will get the correct output.
I tried saving as binary, then opening but that did not work:
with pymupdf.open(path2pdf) as doc: # open document text = chr(12).join([page.get_text() for page in doc]) # write as a binary file to support non-ASCII characters pathlib.Path(path2pdf + ".txt").write_bytes(text.encode()) with open(path2text, encoding="utf-8") as f: text = f.read()Like I said, the PDF displays correctly, so the information must be in there! How to extract it??


