Hi,
In the realm of genealogy, people have often turned their life's work into a pdf.
It usually can be read using pdfplumber and sometimes pdfminer.
Now I came across a huge legacy pdf (2014) with 10.000+ pages.
It is well presented,with 11 columns of information on 19th century mariages.
I can read it allright, but the columns and rows are not delimited (eg. with a grid).
All goes well until an information is missing, that creates a huge blank
and messes up the rest of the line.Big time.
Hence 2 questions:
1) Does anybody know a piece of software that will tell me what software created the pdf in the first place?
(Old version of excel, Quatro Pro, access, Word....?) EDIT: pypdf2 found that : it is GPL Ghostscript 8.15. Wow!
2) There are other pdf reading python modules, from somebody's experience,
which one handles blank spaces in a non - gridded row the best. It should be something
that has the notion of CRLF at the end of a line.
thx,
Paul
In the realm of genealogy, people have often turned their life's work into a pdf.
It usually can be read using pdfplumber and sometimes pdfminer.
Now I came across a huge legacy pdf (2014) with 10.000+ pages.
It is well presented,with 11 columns of information on 19th century mariages.
I can read it allright, but the columns and rows are not delimited (eg. with a grid).
All goes well until an information is missing, that creates a huge blank
and messes up the rest of the line.Big time.
Hence 2 questions:
1) Does anybody know a piece of software that will tell me what software created the pdf in the first place?
(Old version of excel, Quatro Pro, access, Word....?) EDIT: pypdf2 found that : it is GPL Ghostscript 8.15. Wow!
2) There are other pdf reading python modules, from somebody's experience,
which one handles blank spaces in a non - gridded row the best. It should be something
that has the notion of CRLF at the end of a line.
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.