Nov-30-2022, 06:20 PM
Hi, can anyone suggest code that I can use that will return all the raw data in a PDF (including any special tags/mark up applied to text).
Appreciate you all.
-Jim
Appreciate you all.
-Jim
Reading All The RAW Data Inside a PDF
|
Nov-30-2022, 06:20 PM
Hi, can anyone suggest code that I can use that will return all the raw data in a PDF (including any special tags/mark up applied to text).
Appreciate you all. -Jim
Nov-30-2022, 06:58 PM
I've been looking at some of the PDF libraries myself, and from what I know so far, I'd suggest you take a look at PyPDF2
Sig:
>>> import this The UNIX philosophy: "Do one thing, and do it well." "The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse "Everything should be made as simple as possible, but not simpler." :~ Albert Einstein (Nov-30-2022, 06:58 PM)rob101 Wrote: I've been looking at some of the PDF libraries myself, and from what I know so far, I'd suggest you take a look at PyPDF2 Yes, I tried this one already, and when I used: import PyPDF2 import fitz import re #Assign File file_name = "STRIVE December Schedule -A.pdf" doc = PyPDF2.PdfFileReader(file_name) #Number of pages pages = doc.getNumPages() for page in doc: current_page = doc.getPage(i) text = current_page.extractText() print(text)The text returned was the "readable" text from the PDF. What I want is a level BELOW that, where I can see the raw markup/tags applied to all the text. Larz60+ write Nov-30-2022, 10:55 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button. Fixed for you this time. Please use BBCode tags on future posts.
Nov-30-2022, 08:07 PM
Ah, okay. Well the only other one I've used is pdfrw 0.4
I've not used it for what you're tying to do, but you may find something there that will work for you.
Sig:
>>> import this The UNIX philosophy: "Do one thing, and do it well." "The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse "Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Nov-30-2022, 10:54 PM
if you really want to get down to the nitty-gritty, see: https://opensource.adobe.com/dc-acrobat-...arted.html
|
|
Possibly Related Threads… | |||||
Thread | Author | Replies | Views | Last Post | |
Reading Data from JSON | tpolim008 | 2 | 1,137 |
Sep-27-2022, 06:34 PM Last Post: Larz60+ |
|
Help reading data from serial RS485 | korenron | 8 | 14,201 |
Nov-14-2021, 06:49 AM Last Post: korenron |
|
Help with WebSocket reading data from anoter function | korenron | 0 | 1,356 |
Sep-19-2021, 11:08 AM Last Post: korenron |
|
Fastest Way of Writing/Reading Data | JamesA | 1 | 2,231 |
Jul-27-2021, 03:52 PM Last Post: Larz60+ |
|
Reading data to python: turn into list or dataframe | hhchenfx | 2 | 5,442 |
Jun-01-2021, 10:28 AM Last Post: Larz60+ |
|
Reading data from mysql. | stsxbel | 2 | 2,254 |
May-23-2021, 06:56 PM Last Post: stsxbel |
|
reading canbus data as hex | korenron | 9 | 6,401 |
Dec-30-2020, 01:52 PM Last Post: korenron |
|
Reading Serial data | Moris526 | 6 | 5,469 |
Dec-26-2020, 04:04 PM Last Post: Moris526 |
|
wrong data reading on uart | fahri | 6 | 3,424 |
Sep-29-2020, 03:07 PM Last Post: Larz60+ |
|
Reading serial data and saving to a file | Mohan | 1 | 7,619 |
May-25-2020, 04:18 PM Last Post: pyzyx3qwerty |