Python Forum
Matching multiple parts in string
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Matching multiple parts in string
#1
What I'm trying is to match a string with a substring, in the substring there are wildcards added, I tryed the next without luck, it is only matching without wildcards added in the Bstring and the Bstring is exactly the same as the Astring:
import re
Astring = ['123.456.789.10.11.12.abc']
Bstring = ['*.456.789.*.11.12.*']
           if re.findall(Astring, Bstring)):
          (do something)
Reply
#2
1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.

Changing those things:

>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']
Reply
#3
(Apr-26-2022, 03:35 PM)bowlofred Wrote: 1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.

Changing those things:

>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']
Thank you for the help,
Some parts of the ident are static, some parts of the ident are dynamic, when someone with an ident is logging into the server with a full ident (a) there is a .txt file with a list of idents with wildcards added (b), The goal is to match the full ident (a) with the parts that are still static with the ones in the list (b) .txt file so i can take further action with the closest match.
The original codeline i had was:
if any(a in u for u in b):
But that is only matching when the ident (a) is an exact ident match whithout wildcards added in the list (b), so im looking for a solution but that seems not that easy after days searching.
Reply
#4
You can do mostly the same thing with a regex, but if you use all() in that manner, you don't get any information back about which pattern matched. That may be fine, or it may not. You could also just run through each pattern in order. In that case you should make sure the most specific patterns are first, since it won't check after the first match. This is an explicit loop so that the pattern is available for you.

import re

patterns = [
        r'.*\.87\.255.*',
        r'.*\.456\.789.*\.11\.12\..*',
        r'.*\.332\..*'
        ]

targets = [
        "1.2.3.4.5",
        '123.456.789.10.11.12.abc',
        ]


for target in targets:
    for pattern in patterns:
        if re.search(pattern, target):
            print(f"target {target} matched the pattern {pattern}")
            break
    else:
        print(f"target {target} did not match any patterns")
Output:
target 1.2.3.4.5 did not match any patterns target 123.456.789.10.11.12.abc matched the pattern .*\.456\.789.*\.11\.12\..*
Reply
#5
If I understand correctly then problem can be described: match groups of characters with each other or with wildcard. If so, it can be solved using zip:

values = '123.456.789.10.11.12.abc'
wildcarded = '*.456.789.*.11.12.*'

for value, wildcard in zip(values.split('.'), wildcarded.split('.')):
    if wildcard in (value, '*'):
        continue
    else:
        # do something

# or using all()

all(wildcard in (value, '*') for value, wildcard in zip(values.split('.'), wildcarded.split('.'))   # True if all matching
If wildcards represent single character then split should be omitted and it will work same way (iterating over strings character by character).
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
Sorry, it doesn't match in the script, I know i it will match with a true or false but i guess this is more complicated thats why I am going to explain it some more, it is IRC related, the code is a 'blacklist' (or whitelist if you want), the point is:if a spambot is joining my channel on irc, the script should look in my blacklist.txt file where are multiple hosts are added, most hosts are added into the blacklist.txt with some (*) wildcards since not every ip of the spammer is static, so some entry's are added like: 123.456.*.10.*.abc etc. So, when a spam bot joins the channel, the script catches the full hostname of the spambot e.g.: 123.456.789.10.abc , in my blacklist is added: 123.456.*.10.*.abc (or other parts wildcarded, hope I am clear on this), so the goal is: a host is joining the channel, the script looks into the blacklist if there is a (wildcarded) match and take action. Untill now i have no luck with this, Im an old tcl'r and new to python, in tcl there was a oneliner for this: if {[string match ........., maybe this can help , thank you for the help, fozz
Reply
#7
Could this maybe a part of the solution?
import re
s = "tim email is [email protected]"
match = re.search('([\w.-]+)@([\w.-]+)', s)
if match:
    print(match.group()) ## [email protected] (the whole match)
    print(match.group(1)) ## tim (the username, group 1)
    print(match.group(2)) ## somehost (the host, group 2)
Reply
#8
@bowlofred: Do i need to change every line (with the wildcarded hosts) in the .txt file like this like in your example?:

r'.*\.456\.789.*\.11\.12\..*'

Is that the only solution? I'm pulling my hair out on this :)
Reply
#9
That seems the easiest way to do it. Change all the dots to \. and change all the stars to .*. Then you can use it as a regex pattern.
Reply
#10
Thank you so much, the thing what I really don't understand is:

In the original situation i had added the hosts in the .txt file like this:

*!*@123.456.789.10.11.12.abc

And that was matching with:
if any(a in u for u in b):
So There are wildcards in the line and a exclamation mark and the script is matching this host, why isn't this matching with a line added in the .txt file like:
*!*@123.456.789.10.11.12.*
I don't understand this.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Matching string from a file tester_V 5 442 Mar-05-2024, 05:46 AM
Last Post: Danishhafeez
  splitting file into multiple files by searching for string AlphaInc 2 897 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  Save multiple Parts of Bytearray to File ? lastyle 1 947 Dec-10-2022, 08:09 AM
Last Post: Gribouillis
  matching a repeating string Skaperen 2 1,248 Jun-23-2022, 10:34 PM
Last Post: Skaperen
  Extract parts of multiple log-files and put it in a dataframe hasiro 4 2,090 Apr-27-2022, 12:44 PM
Last Post: hasiro
  Search multiple CSV files for a string or strings cubangt 7 8,042 Feb-23-2022, 12:53 AM
Last Post: Pedroski55
  Matching Exact String(s) Extra 4 1,916 Jan-12-2022, 04:06 PM
Last Post: Extra
  Replace String in multiple text-files [SOLVED] AlphaInc 5 8,132 Aug-08-2021, 04:59 PM
Last Post: Axel_Erfurt
Question How to extract multiple text from a string? chatguy 2 2,371 Feb-28-2021, 07:39 AM
Last Post: bowlofred
  How to print string multiple times on new line ace19887 7 5,744 Sep-30-2020, 02:53 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020