Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex Pattern
#6
I think it would be necessary to see some more emails with similar lines to filter it out... But assuming that "image003" is actually not a filename then:

You could get rid of it in various ways but each would have its' drawbacks. Each would be associated with a risk (very small when done right) that some valid text will get cut from the message because it resembled that "jpg_image_big_line".

My suggestion would be to filter it based on:
- the begining (it starts with image003, so it would have to make sure the line starts with "image" and 3 digits
- how long is the line and whether it contains spaces (you can see that this line is very long and doesn't have spaces, this will additionally decrease risk of some valid text being cut out by this additional regex)

import re

with open('email.txt', 'r') as f:
    text = f.read()

patterns = [
    re.compile(r'<!--.*-->',re.DOTALL),
    re.compile(r'^\s*$', re.MULTILINE),
    re.compile(r'^image\d{3}[^\s]{10,}', re.MULTILINE)
    ]

for p in patterns:
    text = p.sub('', text)

print(text)


'''
Details/description of this line: '^image\d{3}[^\s]{10,}'

^ - begining of line
image - text itself
\d{3} - 3 digits
[^\s]{10,} - at least 10 chars following "image003" not being whitespace
'''
Edit: I'm a moron, image003.jpg must be a filename... It could be filtered based on other things but it would be much better to see more examples of emails (just to avoid writting patterns that end up being inefficient)
Reply


Messages In This Thread
Regex Pattern - by NewBeie - May-07-2019, 07:44 AM
RE: Regex Pattern - by Gribouillis - May-07-2019, 09:38 AM
RE: Regex Pattern - by snippsat - May-07-2019, 02:06 PM
RE: Regex Pattern - by michalmonday - May-08-2019, 11:50 AM
RE: Regex Pattern - by NewBeie - May-13-2019, 05:48 AM
RE: Regex Pattern - by michalmonday - May-13-2019, 01:27 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Regex pattern match WJSwan 2 1,281 Feb-07-2023, 04:52 AM
Last Post: WJSwan
  regex pattern to extract relevant sentences Bubly 2 1,872 Jul-06-2021, 04:17 PM
Last Post: Bubly
  Reading a Regex pattern stahorse 12 5,235 Apr-25-2019, 10:21 AM
Last Post: NewBeie
  Regex, creating a pattern stahorse 5 3,203 Apr-24-2019, 08:29 AM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020