Need help to open PDF file and Export to text file

ratna_ain · Oct-10-2017, 01:44 AM

(Oct-09-2017, 05:49 PM)buran Wrote: python has plenty of packages that allow to convert pdf to text (extract text from pdf) in a native way, not by sending keys to external application.
Just to name a few (in no particular order, i.e. not as recommendation):
textract
PDFminer - python2 and its pdf2txt tool. also pdfminer.six - a fork with python2/3 support
slate - wrapper around PDFminer

Yeah, I have tried all of them including apache tika.
The problem is when I use those packages , some files(around 10%) are not extracted correctly. Example : the title is extracted in the middle of the content, but actually the title is at the top in the PDF.
And when I extract them with adobe reader manually, the title is extracted correctly. So my idea is to use this adobe reader for those files with these exception. We cannot do it one by one because the volume of the files is high.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to Randomly Print a Quote From a Text File When User Types a Command on Main Menu	BillKochman	13	936	Apr-24-2024, 05:47 AM Last Post: Bronjer
	very newbie problem on text file	zapad	2	234	Apr-12-2024, 06:50 PM Last Post: zapad
	Open/save file on Android	frohr	0	340	Jan-24-2024, 06:28 PM Last Post: frohr
	file open "file not found error"	shanoger	8	1,165	Dec-14-2023, 08:03 AM Last Post: shanoger
	Replace a text/word in docx file using Python	Devan	4	3,485	Oct-17-2023, 06:03 PM Last Post: Devan
	Need to replace a string with a file (HTML file)	tester_V	1	778	Aug-30-2023, 03:42 AM Last Post: Larz60+
	How can i combine these two functions so i only open the file once?	cubangt	4	877	Aug-14-2023, 05:04 PM Last Post: snippsat
	How can I change the uuid name of a file to his original file?	MaddoxMB	2	948	Jul-17-2023, 10:15 PM Last Post: Pedroski55
	save values permanently in python (perhaps not in a text file)?	flash77	8	1,251	Jul-07-2023, 05:44 PM Last Post: flash77
	Start print a text after open an async task via button	Nietzsche	0	718	May-15-2023, 06:52 AM Last Post: Nietzsche

Need help to open PDF file and Export to text file

User Panel Messages

Announcements