Need help to open PDF file and Export to text file

ratna_ain · Oct-09-2017, 01:07 PM

Hi All,

I need to open PDF file with Adobe Reader and save to text file using sendkeys:
- File : ALT+F
- Save to Others : H
- Text : X

This is my code to open the file and sendkeys:

import win32com.client
import os
from sys import argv

shell = win32com.client.Dispatch("WScript.Shell")

filename = "C:\RATNA\temp\TU1-2.pdf"

os.chdir('C:\\RATNA\\temp')

os.system('"C:\\Program Files (x86)\\Adobe\\Reader 11.0\\Reader\\AcroRd32.exe" TU1-2.pdf' )

shell.AppActivate('Acrobat.exe')

shell.SendKeys("%{f}",0)
shell.SendKeys("H", 0)
shell.SendKeys("X", 0)

The problem with this code is the sendkeys will be triggered only after I closed the PDF file.

Thank You

**nilamo** · Oct-09-2017, 04:24 PM

os.system will block until the call completes. If you use something that doesn't block, such as the subprocess module, it might work.
https://docs.python.org/3/library/subpro...rocess.run

**buran** · (This post was last modified: Oct-09-2017, 05:49 PM by buran.)

python has plenty of packages that allow to convert pdf to text (extract text from pdf) in a native way, not by sending keys to external application.
Just to name a few (in no particular order, i.e. not as recommendation):
textract
PDFminer - python2 and its pdf2txt tool. also pdfminer.six - a fork with python2/3 support
slate - wrapper around PDFminer

ratna_ain · Oct-10-2017, 01:44 AM

(Oct-09-2017, 05:49 PM)buran Wrote: python has plenty of packages that allow to convert pdf to text (extract text from pdf) in a native way, not by sending keys to external application.
Just to name a few (in no particular order, i.e. not as recommendation):
textract
PDFminer - python2 and its pdf2txt tool. also pdfminer.six - a fork with python2/3 support
slate - wrapper around PDFminer

Yeah, I have tried all of them including apache tika.
The problem is when I use those packages , some files(around 10%) are not extracted correctly. Example : the title is extracted in the middle of the content, but actually the title is at the top in the PDF.
And when I extract them with adobe reader manually, the title is extracted correctly. So my idea is to use this adobe reader for those files with these exception. We cannot do it one by one because the volume of the files is high.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How can I write formatted (i.e. bold, italic, change font size, etc.) text to a file?	JohnJSal	12	28,233	Feb-13-2025, 04:48 AM Last Post: tomhansky
	How to write variable in a python file then import it in another python file?	tatahuft	4	974	Jan-01-2025, 12:18 AM Last Post: Skaperen
	Problems writing a large text file in python	Vilius	4	1,064	Dec-21-2024, 09:20 AM Last Post: Pedroski55
	Get an FFMpeg pass to subprocess.PIPE to treat list as text file?	haihal	2	1,077	Nov-21-2024, 11:48 PM Last Post: haihal
	Trying to open depracated joblib file	mckennamason	0	758	Sep-19-2024, 03:30 PM Last Post: mckennamason
	JSON File - extract only the data in a nested array for CSV file	shwfgd	2	1,095	Aug-26-2024, 10:14 PM Last Post: shwfgd
	FileNotFoundError: [Errno 2] No such file or directory although the file exists	Arnibandyo	0	1,021	Aug-12-2024, 09:11 AM Last Post: Arnibandyo
	"[Errno 2] No such file or directory" (.py file)	IbrahimBennani	13	6,560	Jun-17-2024, 12:26 AM Last Post: AdamHensley
	Reading an ASCII text file and parsing data...	oradba4u	2	1,475	Jun-08-2024, 12:41 AM Last Post: oradba4u
	very newbie problem on text file	zapad	2	1,077	Apr-12-2024, 06:50 PM Last Post: zapad

Need help to open PDF file and Export to text file

User Panel Messages

Announcements