Python Forum
Need help to open PDF file and Export to text file
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help to open PDF file and Export to text file
#1
Hi All,

I need to open PDF file with Adobe Reader and save to text file using sendkeys:
- File : ALT+F
- Save to Others : H
- Text : X

This is my code to open the file and sendkeys:

import win32com.client
import os
from sys import argv

shell = win32com.client.Dispatch("WScript.Shell")

filename = "C:\RATNA\temp\TU1-2.pdf"

os.chdir('C:\\RATNA\\temp')

os.system('"C:\\Program Files (x86)\\Adobe\\Reader 11.0\\Reader\\AcroRd32.exe" TU1-2.pdf' )

shell.AppActivate('Acrobat.exe')

shell.SendKeys("%{f}",0)
shell.SendKeys("H", 0)
shell.SendKeys("X", 0)

The problem with this code is the sendkeys will be triggered only after I closed the PDF file.

Thank You
Reply
#2
os.system will block until the call completes.  If you use something that doesn't block, such as the subprocess module, it might work.
https://docs.python.org/3/library/subpro...rocess.run
Reply
#3
python has plenty of packages that allow to convert pdf to text (extract text from pdf) in a native way, not by sending keys to external application.
Just to name a few (in no particular order, i.e. not as recommendation):
textract
PDFminer - python2 and its pdf2txt tool. also pdfminer.six - a fork with python2/3 support
slate - wrapper around PDFminer
Reply
#4
(Oct-09-2017, 05:49 PM)buran Wrote: python has plenty of packages that allow to convert pdf to text (extract text from pdf) in a native way, not by sending keys to external application.
Just to name a few (in no particular order, i.e. not as recommendation):
textract
PDFminer - python2 and its pdf2txt tool. also pdfminer.six - a fork with python2/3 support
slate - wrapper around PDFminer

Yeah, I have tried all of them including apache tika.
The problem is when I use those packages , some files(around 10%) are not extracted correctly. Example : the title is extracted in the middle of the content, but actually the title is at the top in the PDF.
And when I extract them with adobe reader manually, the title is extracted correctly. So my idea is to use this adobe reader for those files with these exception. We cannot do it one by one because the volume of the files is high.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I write formatted (i.e. bold, italic, change font size, etc.) text to a file? JohnJSal 12 27,714 Feb-13-2025, 04:48 AM
Last Post: tomhansky
  How to write variable in a python file then import it in another python file? tatahuft 4 840 Jan-01-2025, 12:18 AM
Last Post: Skaperen
  Problems writing a large text file in python Vilius 4 926 Dec-21-2024, 09:20 AM
Last Post: Pedroski55
  Get an FFMpeg pass to subprocess.PIPE to treat list as text file? haihal 2 966 Nov-21-2024, 11:48 PM
Last Post: haihal
  Trying to open depracated joblib file mckennamason 0 677 Sep-19-2024, 03:30 PM
Last Post: mckennamason
  JSON File - extract only the data in a nested array for CSV file shwfgd 2 997 Aug-26-2024, 10:14 PM
Last Post: shwfgd
  FileNotFoundError: [Errno 2] No such file or directory although the file exists Arnibandyo 0 772 Aug-12-2024, 09:11 AM
Last Post: Arnibandyo
  "[Errno 2] No such file or directory" (.py file) IbrahimBennani 13 6,016 Jun-17-2024, 12:26 AM
Last Post: AdamHensley
  Reading an ASCII text file and parsing data... oradba4u 2 1,347 Jun-08-2024, 12:41 AM
Last Post: oradba4u
  How to Randomly Print a Quote From a Text File When User Types a Command on Main Menu BillKochman 13 4,049 Apr-24-2024, 05:47 AM
Last Post: Bronjer

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020