Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex, creating a pattern
#1
Hi,

I have a text that I'm trying to create a pattern for, I got thud far:

import re

text = '''
image003.jpgimage/[email protected]:39:50truefalseimage001.jpgimage/[email protected]:39:50truefalseNJS Basson Snr Basson Familie Trust  Gothoma Diggings NJS.xlsxapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet41BB382125CA54438BDE8428DBDF8B6C@mns.internal.co.za868522019-04-01T12:39:50falsefalseNJS BASSON SNR.PDFapplication/pdfEBA76A0D4619594DB6951BE861F9FF9E@mns.internal.co.za3638912019-04-01T12:39:50falsefalse4798442019-04-01T12:39:31Z2019-04-01T12:39:[email protected] EscalationAgriCCC.Escalation@santam.co.zaSMTPMailboxtruetrueEternity [email protected] 000000000
ZZZZ
'''

pattern = re.compile(r'image[\d]+.+')
matches = pattern.finditer(text)

for match in matches:
    print(match)
Output:
Output:
"<re.Match object; span=(681, 1413), match='image003.jpgimage/[email protected]>"
I'm looking for expression that can pick anything similar, close or exactly that, please help
Reply
#2
Try this pattern: re.findall(r'image\d+\.jpg', text)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Hi,
I tried findall:

for line in textonly2:
    results = re.findall(r'image\d+\.jpg', line)
    print(results)
Output:
Output:
[] [] [] [] [] []
Empty list
Reply
#4
I

Have removed punctuation, but I still need a code that will pull any sentence with image:

image003jpgimagejpegimage003jpg01D4E56EA5A4A0E0813720190401T123950truefalseimage001jpgimagejpegimage001jpg01D4E898B7AA95B0813720190401T123950truefalseNJS Basson Snr Basson Familie Trust Gothoma Diggings NJSxlsxapplicationvndopenxmlformatsofficedocumentspreadsheetmlsheet41BB382125CA54438BDE8428DBDF8B6Cmnsinternalcoza8685220190401T123950falsefalseNJS BASSON SNRPDFapplicationpdfEBA76A0D4619594DB6951BE861F9FF9Emnsinternalcoza36389120190401T123950falsefalse47984420190401T123931Z20190401T123950ZtruefalseLandbouLandbousantamcozaSMTPMailboxAgriculture EscalationAgriCCCEscalationsantamcozaSMTPMailboxtruetrueEternity ClaimseternityclaimskoshcomcozaSMTPOneOfffalse 000000000 ZZZZ
There are white spaces in between my text, whenever it finds an image sentence it needs to remove it. This is the text that should be removed from the above:
image003jpgimagejpegimage003jpg01D4E56EA5A4A0E0813720190401T123950truefalseimage001jpgimagejpegimage001jpg01D4E898B7AA95B0813720190401T123950truefalseNJS
Reply
#5
It somewhat unclear what must be accomplished here. Isin't regex too complicated solution for achieving desired result? To get rid of 'jpg' containing chunks in row / sentence one can just:

In [1]: sentence = "image003jpgimagejpegimage003jpg01D4E56EA5A4A0E0813720190401T123950truefalseimage001jpgim
   ...: agejpegimage001jpg01D4E898B7AA95B0813720190401T123950truefalseNJS Basson Snr Basson Familie Trust Go
   ...: thoma Diggings NJSxlsxapplicationvndopenxmlformatsofficedocumentspreadsheetmlsheet41BB382125CA54438B
   ...: DE8428DBDF8B6Cmnsinternalcoza8685220190401T123950falsefalseNJS BASSON SNRPDFapplicationpdfEBA76A0D46
   ...: 19594DB6951BE861F9FF9Emnsinternalcoza36389120190401T123950falsefalse47984420190401T123931Z20190401T1
   ...: 23950ZtruefalseLandbouLandbousantamcozaSMTPMailboxAgriculture EscalationAgriCCCEscalationsantamcozaS
   ...: MTPMailboxtruetrueEternity ClaimseternityclaimskoshcomcozaSMTPOneOfffalse 000000000 ZZZZ"           

In [2]: ' '.join(chunk for chunk in sentence.split() if 'jpg' not in chunk)                                 
Out[2]: 'Basson Snr Basson Familie Trust Gothoma Diggings NJSxlsxapplicationvndopenxmlformatsofficedocumentspreadsheetmlsheet41BB382125CA54438BDE8428DBDF8B6Cmnsinternalcoza8685220190401T123950falsefalseNJS BASSON SNRPDFapplicationpdfEBA76A0D4619594DB6951BE861F9FF9Emnsinternalcoza36389120190401T123950falsefalse47984420190401T123931Z20190401T123950ZtruefalseLandbouLandbousantamcozaSMTPMailboxAgriculture EscalationAgriCCCEscalationsantamcozaSMTPMailboxtruetrueEternity ClaimseternityclaimskoshcomcozaSMTPOneOfffalse 000000000 ZZZZ'
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
I modified your code and got this:

Output:
andre@andre-GP70-2PE:~/Schreibtisch$ python matches.py Match Object: <re.Match object; span=(1, 13), match='image003.jpg'> Filename: image003.jpg Start-IDX: 1 Stop-IDX: 13 Match Object: <re.Match object; span=(23, 35), match='image003.jpg'> Filename: image003.jpg Start-IDX: 23 Stop-IDX: 35 Match Object: <re.Match object; span=(85, 97), match='image001.jpg'> Filename: image001.jpg Start-IDX: 85 Stop-IDX: 97 Match Object: <re.Match object; span=(107, 119), match='image001.jpg'> Filename: image001.jpg Start-IDX: 107 Stop-IDX: 119
import re
 
text = '''
image003.jpgimage/[email protected]:39:50truefalseimage001.jpgimage/[email protected]:39:50truefalseNJS Basson Snr Basson Familie Trust  Gothoma Diggings NJS.xlsxapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet41BB382125CA54438BDE8428DBDF8B6C@mns.internal.co.za868522019-04-01T12:39:50falsefalseNJS BASSON SNR.PDFapplication/pdfEBA76A0D4619594DB6951BE861F9FF9E@mns.internal.co.za3638912019-04-01T12:39:50falsefalse4798442019-04-01T12:39:31Z2019-04-01T12:39:[email protected] EscalationAgriCCC.Escalation@santam.co.zaSMTPMailboxtruetrueEternity [email protected] 000000000
ZZZZ
'''
 
pattern = re.compile(r'image\d+\.jpg')
matches = pattern.finditer(text)
 
for match in matches:
    print('Match Object:', match) # the raw match object
    print('Filename:', match.group()) # match.group() -> result
    start, stop = match.span() # index of current match in text
    print('Start-IDX:', start, 'Stop-IDX:', stop)
    
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Regex pattern match WJSwan 2 1,241 Feb-07-2023, 04:52 AM
Last Post: WJSwan
  regex pattern to extract relevant sentences Bubly 2 1,861 Jul-06-2021, 04:17 PM
Last Post: Bubly
  Creating new list based on exact regex match in original list interjectdirector 1 2,277 Mar-08-2020, 09:30 PM
Last Post: deanhystad
  Regex Pattern NewBeie 5 3,032 May-13-2019, 01:27 PM
Last Post: michalmonday
  Reading a Regex pattern stahorse 12 5,201 Apr-25-2019, 10:21 AM
Last Post: NewBeie
  Help creating a pattern using the re module zulu_likuum 6 2,988 Jan-09-2019, 10:51 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020