a collection of pdf-files (copy of a book) in disorder: solfing wiht pikepdf - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Forum & Off Topic (https://python-forum.io/forum-23.html) +--- Forum: Bar (https://python-forum.io/forum-27.html) +--- Thread: a collection of pdf-files (copy of a book) in disorder: solfing wiht pikepdf (/thread-33860.html) |
a collection of pdf-files (copy of a book) in disorder: solfing wiht pikepdf - apollo - Jun-02-2021 hello dear Python-friends, added an update below ... first of all - i hope youre well and all goes okay at your hometown. i have a collection of 330 pages ( a copy of a book) with mupdf i have separated the pages. But unfortunatly the pages are not in a linear order - i need to reorder in order to get a right setting for printing the stuff the question: how to achieve this!? Should i take a pdf-programme and cut the pdf-pages or should i stick to a pythonic way: i heard about pikepdf: It provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF. Python + QPDF ... Extract content from a PDF such as text or images. PDFDocEncoding Quote:The PDF specification defines PDFDocEncoding, a character encoding used only in PDFs. This encoding matches ASCII for code points 32 through 126 (0x20 to 0x7e). At all other code points, it is not ASCII and cannot be treated as equivalent. If you look at a PDF in a binary file viewer (hex editor), a string surrounded by parentheses such as (Hello World) is usually using PDFDocEncoding. PDFDocEncoding Quote:The PDF specification defines PDFDocEncoding, a character encoding used only in PDFs. This encoding matches ASCII for code points 32 through 126 (0x20 to 0x7e). At all other code points, it is not ASCII and cannot be treated as equivalent. If you look at a PDF in a binary file viewer (hex editor), a string surrounded by parentheses such as (Hello World) is usually using PDFDocEncoding. https://github.com/pikepdf/pikepdf pikepdf.readthedocs.io/ https://pypi.org/project/pikepdf/ Released: May 21, 2021 version: pikepdf 2.12.1 well this sound very good . do you think that i can solve my issues with that!? update: the background: to explain all a bit more: i run into these issues while applying Mutool and MuP running this on MX-Linux: I'm tried to work with the latest release of MuPDF library. my findings: if i a the document into pieces (A 5) then i get fancy results: the number of the pages (the pagination) does fully get lost.. 1,4,3,2,5, and so forth - and this is awful btw: see the commands i run: mutool poster -x 2 input.pdf output.pdf..states that the document should be divided into two parts in the X axis. The cutting axis is accordingly in the middle from top to bottom, so that two equal sides are created on the left and right. You can split a document into individual pages with pdftk pdftk input.pdf burstwe can find the output files in the same directory as pg_0001.pdf, pg_0002.pdf etc what goes wrong here!? see the datset - https://www.file-upload.net/download-14207207/__0_100__20200413204027.pdf.html what is wanted: i want to cut this into A5 :: note: the A5-Formate is 148 mm width and 210 mm height i use the commands from these ressources: https://www.mankier.com/1/mupdf https://mupdf.com/docs/ any ideas? RE: a collection of pdf-files (copy of a book) in disorder: solfing wiht pikepdf - nilamo - Jun-02-2021 Without knowing how the pages are ordered improperly, there's no way I could say whether or not a particular utility will solve your particular issue. That said, I've used PyPDF2 before, and it seems pretty competent. https://pythonhosted.org/PyPDF2/ RE: a collection of pdf-files (copy of a book) in disorder: solfing wiht pikepdf - Gribouillis - Jun-02-2021 In linux here is how I would do
RE: a collection of pdf-files (copy of a book) in disorder: solfing wiht pikepdf - apollo - Jun-03-2021 hello you both, good day dear nilamo and Gribouillis many many thanks for your reply. i am very happy to hear from you. your ideas seem to be very helpful. I will try out you approaches. note: i added an update: and described how i went into these issues: i run into these issues with Mutool and MuP running this on MX-Linux: I'm trying to work with the latest release of MuPDF library. if i a the document into pieces (A 5) then i get fancy results: the number of the pages (the pagination) does fully get lost.. 1,4,3,2,5, and so forth btw: see the commands i run: mutool poster -x 2 input.pdf output.pdf...see more above. note: again. i am very happy to see your ideas. your ideas seem to be very helpful. I will try out you approaches. i come back and report all my findings dear nilamo and Gribouillis - many thanks. have a great day. apollo |