Python Forum
Matching multiple parts in string - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Matching multiple parts in string (/thread-37044.html)

Pages: 1 2 3 4


Matching multiple parts in string - fozz - Apr-26-2022

What I'm trying is to match a string with a substring, in the substring there are wildcards added, I tryed the next without luck, it is only matching without wildcards added in the Bstring and the Bstring is exactly the same as the Astring:
import re
Astring = ['123.456.789.10.11.12.abc']
Bstring = ['*.456.789.*.11.12.*']
           if re.findall(Astring, Bstring)):
          (do something)



RE: Matching multiple parts in string - bowlofred - Apr-26-2022

1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.

Changing those things:

>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']



RE: Matching multiple parts in string - fozz - Apr-26-2022

(Apr-26-2022, 03:35 PM)bowlofred Wrote: 1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.

Changing those things:

>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']
Thank you for the help,
Some parts of the ident are static, some parts of the ident are dynamic, when someone with an ident is logging into the server with a full ident (a) there is a .txt file with a list of idents with wildcards added (b), The goal is to match the full ident (a) with the parts that are still static with the ones in the list (b) .txt file so i can take further action with the closest match.
The original codeline i had was:
if any(a in u for u in b):
But that is only matching when the ident (a) is an exact ident match whithout wildcards added in the list (b), so im looking for a solution but that seems not that easy after days searching.


RE: Matching multiple parts in string - bowlofred - Apr-26-2022

You can do mostly the same thing with a regex, but if you use all() in that manner, you don't get any information back about which pattern matched. That may be fine, or it may not. You could also just run through each pattern in order. In that case you should make sure the most specific patterns are first, since it won't check after the first match. This is an explicit loop so that the pattern is available for you.

import re

patterns = [
        r'.*\.87\.255.*',
        r'.*\.456\.789.*\.11\.12\..*',
        r'.*\.332\..*'
        ]

targets = [
        "1.2.3.4.5",
        '123.456.789.10.11.12.abc',
        ]


for target in targets:
    for pattern in patterns:
        if re.search(pattern, target):
            print(f"target {target} matched the pattern {pattern}")
            break
    else:
        print(f"target {target} did not match any patterns")
Output:
target 1.2.3.4.5 did not match any patterns target 123.456.789.10.11.12.abc matched the pattern .*\.456\.789.*\.11\.12\..*



RE: Matching multiple parts in string - perfringo - Apr-28-2022

If I understand correctly then problem can be described: match groups of characters with each other or with wildcard. If so, it can be solved using zip:

values = '123.456.789.10.11.12.abc'
wildcarded = '*.456.789.*.11.12.*'

for value, wildcard in zip(values.split('.'), wildcarded.split('.')):
    if wildcard in (value, '*'):
        continue
    else:
        # do something

# or using all()

all(wildcard in (value, '*') for value, wildcard in zip(values.split('.'), wildcarded.split('.'))   # True if all matching
If wildcards represent single character then split should be omitted and it will work same way (iterating over strings character by character).


RE: Matching multiple parts in string - fozz - Apr-29-2022

Sorry, it doesn't match in the script, I know i it will match with a true or false but i guess this is more complicated thats why I am going to explain it some more, it is IRC related, the code is a 'blacklist' (or whitelist if you want), the point is:if a spambot is joining my channel on irc, the script should look in my blacklist.txt file where are multiple hosts are added, most hosts are added into the blacklist.txt with some (*) wildcards since not every ip of the spammer is static, so some entry's are added like: 123.456.*.10.*.abc etc. So, when a spam bot joins the channel, the script catches the full hostname of the spambot e.g.: 123.456.789.10.abc , in my blacklist is added: 123.456.*.10.*.abc (or other parts wildcarded, hope I am clear on this), so the goal is: a host is joining the channel, the script looks into the blacklist if there is a (wildcarded) match and take action. Untill now i have no luck with this, Im an old tcl'r and new to python, in tcl there was a oneliner for this: if {[string match ........., maybe this can help , thank you for the help, fozz


RE: Matching multiple parts in string - fozz - May-01-2022

Could this maybe a part of the solution?
import re
s = "tim email is [email protected]"
match = re.search('([\w.-]+)@([\w.-]+)', s)
if match:
    print(match.group()) ## [email protected] (the whole match)
    print(match.group(1)) ## tim (the username, group 1)
    print(match.group(2)) ## somehost (the host, group 2)



RE: Matching multiple parts in string - fozz - May-05-2022

@bowlofred: Do i need to change every line (with the wildcarded hosts) in the .txt file like this like in your example?:

r'.*\.456\.789.*\.11\.12\..*'

Is that the only solution? I'm pulling my hair out on this :)


RE: Matching multiple parts in string - bowlofred - May-05-2022

That seems the easiest way to do it. Change all the dots to \. and change all the stars to .*. Then you can use it as a regex pattern.


RE: Matching multiple parts in string - fozz - May-06-2022

Thank you so much, the thing what I really don't understand is:

In the original situation i had added the hosts in the .txt file like this:

*!*@123.456.789.10.11.12.abc

And that was matching with:
if any(a in u for u in b):
So There are wildcards in the line and a exclamation mark and the script is matching this host, why isn't this matching with a line added in the .txt file like:
*!*@123.456.789.10.11.12.*
I don't understand this.