Matching multiple parts in string

fozz · Apr-26-2022, 02:48 PM

What I'm trying is to match a string with a substring, in the substring there are wildcards added, I tryed the next without luck, it is only matching without wildcards added in the Bstring and the Bstring is exactly the same as the Astring:

import re
Astring = ['123.456.789.10.11.12.abc']
Bstring = ['*.456.789.*.11.12.*']
           if re.findall(Astring, Bstring)):
          (do something)

bowlofred · Apr-26-2022, 03:35 PM

1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.

Changing those things:

>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']

fozz · Apr-26-2022, 09:59 PM

(Apr-26-2022, 03:35 PM)bowlofred Wrote: 1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.

Changing those things:
>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']

Thank you for the help,
Some parts of the ident are static, some parts of the ident are dynamic, when someone with an ident is logging into the server with a full ident (a) there is a .txt file with a list of idents with wildcards added (b), The goal is to match the full ident (a) with the parts that are still static with the ones in the list (b) .txt file so i can take further action with the closest match.
The original codeline i had was:

if any(a in u for u in b):

But that is only matching when the ident (a) is an exact ident match whithout wildcards added in the list (b), so im looking for a solution but that seems not that easy after days searching.

bowlofred · Apr-26-2022, 10:37 PM

You can do mostly the same thing with a regex, but if you use all() in that manner, you don't get any information back about which pattern matched. That may be fine, or it may not. You could also just run through each pattern in order. In that case you should make sure the most specific patterns are first, since it won't check after the first match. This is an explicit loop so that the pattern is available for you.

import re

patterns = [
        r'.*\.87\.255.*',
        r'.*\.456\.789.*\.11\.12\..*',
        r'.*\.332\..*'
        ]

targets = [
        "1.2.3.4.5",
        '123.456.789.10.11.12.abc',
        ]


for target in targets:
    for pattern in patterns:
        if re.search(pattern, target):
            print(f"target {target} matched the pattern {pattern}")
            break
    else:
        print(f"target {target} did not match any patterns")

Output:target 1.2.3.4.5 did not match any patterns
target 123.456.789.10.11.12.abc matched the pattern .*\.456\.789.*\.11\.12\..*

**perfringo** · Apr-28-2022, 06:39 AM

If I understand correctly then problem can be described: match groups of characters with each other or with wildcard. If so, it can be solved using zip:

values = '123.456.789.10.11.12.abc'
wildcarded = '*.456.789.*.11.12.*'

for value, wildcard in zip(values.split('.'), wildcarded.split('.')):
    if wildcard in (value, '*'):
        continue
    else:
        # do something

# or using all()

all(wildcard in (value, '*') for value, wildcard in zip(values.split('.'), wildcarded.split('.'))   # True if all matching

If wildcards represent single character then split should be omitted and it will work same way (iterating over strings character by character).

fozz · Apr-29-2022, 10:26 PM

Sorry, it doesn't match in the script, I know i it will match with a true or false but i guess this is more complicated thats why I am going to explain it some more, it is IRC related, the code is a 'blacklist' (or whitelist if you want), the point is:if a spambot is joining my channel on irc, the script should look in my blacklist.txt file where are multiple hosts are added, most hosts are added into the blacklist.txt with some (*) wildcards since not every ip of the spammer is static, so some entry's are added like: 123.456.*.10.*.abc etc. So, when a spam bot joins the channel, the script catches the full hostname of the spambot e.g.: 123.456.789.10.abc , in my blacklist is added: 123.456.*.10.*.abc (or other parts wildcarded, hope I am clear on this), so the goal is: a host is joining the channel, the script looks into the blacklist if there is a (wildcarded) match and take action. Untill now i have no luck with this, Im an old tcl'r and new to python, in tcl there was a oneliner for this: if {[string match ........., maybe this can help , thank you for the help, fozz

fozz · (This post was last modified: May-01-2022, 03:56 AM by fozz.)

Could this maybe a part of the solution?

import re
s = "tim email is [email protected]"
match = re.search('([\w.-]+)@([\w.-]+)', s)
if match:
    print(match.group()) ## [email protected] (the whole match)
    print(match.group(1)) ## tim (the username, group 1)
    print(match.group(2)) ## somehost (the host, group 2)

fozz · May-05-2022, 09:52 AM

@bowlofred: Do i need to change every line (with the wildcarded hosts) in the .txt file like this like in your example?:

r'.*\.456\.789.*\.11\.12\..*'

Is that the only solution? I'm pulling my hair out on this :)

bowlofred · May-05-2022, 07:55 PM

That seems the easiest way to do it. Change all the dots to \. and change all the stars to .*. Then you can use it as a regex pattern.

fozz · May-06-2022, 03:25 PM

Thank you so much, the thing what I really don't understand is:

In the original situation i had added the hosts in the .txt file like this:

*!*@123.456.789.10.11.12.abc

And that was matching with:

if any(a in u for u in b):

So There are wildcards in the line and a exclamation mark and the script is matching this host, why isn't this matching with a line added in the .txt file like:
*!*@123.456.789.10.11.12.*
I don't understand this.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Matching string from a file	tester_V	5	1,895	Mar-05-2024, 05:46 AM Last Post: Danishhafeez
	splitting file into multiple files by searching for string	AlphaInc	2	2,968	Jul-01-2023, 10:35 PM Last Post: Pedroski55
	Save multiple Parts of Bytearray to File ?	lastyle	1	1,652	Dec-10-2022, 08:09 AM Last Post: Gribouillis
	matching a repeating string	Skaperen	2	2,034	Jun-23-2022, 10:34 PM Last Post: Skaperen
	Extract parts of multiple log-files and put it in a dataframe	hasiro	4	3,799	Apr-27-2022, 12:44 PM Last Post: hasiro
	Search multiple CSV files for a string or strings	cubangt	7	13,423	Feb-23-2022, 12:53 AM Last Post: Pedroski55
	Matching Exact String(s)	Extra	4	3,015	Jan-12-2022, 04:06 PM Last Post: Extra
	Replace String in multiple text-files [SOLVED]	AlphaInc	5	11,325	Aug-08-2021, 04:59 PM Last Post: Axel_Erfurt
	How to extract multiple text from a string?	chatguy	2	3,359	Feb-28-2021, 07:39 AM Last Post: bowlofred
	How to print string multiple times on new line	ace19887	7	8,731	Sep-30-2020, 02:53 PM Last Post: buran

Matching multiple parts in string

User Panel Messages

Announcements