Python Forum
How to remove footer from PDF when extracting to text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to remove footer from PDF when extracting to text
#3
(Dec-12-2022, 05:59 PM)Gribouillis Wrote: Try with the re.DOTALL flag
footer_pattern = re.search("(?s)^JOHN.*Confidential$", text)

Just added the (?s) and re.DOTALL and got the same result. Footer is still in the text file. Does this look correct for that flag?

with pdfplumber.open(pdfFilePath) as pdf:
    k = len(pdf.pages)
    for i in range(1, k):
        page = pdf.pages[i]
        text = (page.extract_text())
        footer_pattern = re.search("(?s)^JOHN.*Confidential$", text, re.DOTALL)

        if footer_pattern:
            text = text.replace(footer_pattern, '')

        with open(txtFilePath, 'a') as txtFile:
            txtFile.write(text)
Reply


Messages In This Thread
RE: How to remove footer from PDF when extracting to text - by jh67 - Dec-12-2022, 06:26 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to remove patterns of characters from text aaander 4 2,175 Nov-19-2022, 03:34 PM
Last Post: snippsat
  Extracting Specific Lines from text file based on content. jokerfmj 8 5,643 Mar-28-2022, 03:38 PM
Last Post: snippsat
  Extracting all text from a video jehoshua 2 2,970 Nov-14-2021, 09:54 PM
Last Post: jehoshua
  Want to remove the text from a particular column in excel shantanu97 2 2,794 Jul-05-2021, 05:42 PM
Last Post: eddywinch82
  Extracting the text between each "i class" knight2000 4 3,406 May-26-2021, 09:55 AM
Last Post: knight2000
  More elegant way to remove time from text lines. Pedroski55 6 5,436 Apr-25-2021, 03:18 PM
Last Post: perfringo
  Extracting data based on specific patterns in a text file K11 1 2,870 Aug-28-2020, 09:00 AM
Last Post: Gribouillis
  Highlight and remove specific string of text itsalmade 5 4,662 Dec-11-2019, 11:58 PM
Last Post: micseydel
  Extracting Text Evil_Patrick 6 4,127 Nov-13-2019, 08:51 AM
Last Post: buran
  Reg Xpression to remove a text stahorse 2 3,010 May-14-2019, 05:58 AM
Last Post: stahorse

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020