Posts: 23
Threads: 2
Joined: Apr 2022
What I'm trying is to match a string with a substring, in the substring there are wildcards added, I tryed the next without luck, it is only matching without wildcards added in the Bstring and the Bstring is exactly the same as the Astring:
import re
Astring = ['123.456.789.10.11.12.abc']
Bstring = ['*.456.789.*.11.12.*']
if re.findall(Astring, Bstring)):
(do something)
Posts: 1,583
Threads: 3
Joined: Mar 2020
1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.
Changing those things:
>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc']
Posts: 23
Threads: 2
Joined: Apr 2022
(Apr-26-2022, 03:35 PM)bowlofred Wrote: 1 Astring and Bstring aren't strings, they're lists. Is there a reason you have the brackets around them? Does the findall() call generate an error about passing in an unhashable type?
2 The re module uses regular expressions, not globs. The initial * in Bstring isn't legal and the dots will match anything.
3 re.findall takes a pattern and a string to match. You're passing them in backward.
Changing those things:
>>> a = '123.456.789.10.11.12.abc'
>>> b = '.*.456.789.*.11.12.*'
>>> re.findall(b,a)
['123.456.789.10.11.12.abc'] Thank you for the help,
Some parts of the ident are static, some parts of the ident are dynamic, when someone with an ident is logging into the server with a full ident (a) there is a .txt file with a list of idents with wildcards added (b), The goal is to match the full ident (a) with the parts that are still static with the ones in the list (b) .txt file so i can take further action with the closest match.
The original codeline i had was:
if any(a in u for u in b): But that is only matching when the ident (a) is an exact ident match whithout wildcards added in the list (b), so im looking for a solution but that seems not that easy after days searching.
Posts: 1,583
Threads: 3
Joined: Mar 2020
You can do mostly the same thing with a regex, but if you use all() in that manner, you don't get any information back about which pattern matched. That may be fine, or it may not. You could also just run through each pattern in order. In that case you should make sure the most specific patterns are first, since it won't check after the first match. This is an explicit loop so that the pattern is available for you.
import re
patterns = [
r'.*\.87\.255.*',
r'.*\.456\.789.*\.11\.12\..*',
r'.*\.332\..*'
]
targets = [
"1.2.3.4.5",
'123.456.789.10.11.12.abc',
]
for target in targets:
for pattern in patterns:
if re.search(pattern, target):
print(f"target {target} matched the pattern {pattern}")
break
else:
print(f"target {target} did not match any patterns") Output: target 1.2.3.4.5 did not match any patterns
target 123.456.789.10.11.12.abc matched the pattern .*\.456\.789.*\.11\.12\..*
Posts: 1,950
Threads: 8
Joined: Jun 2018
If I understand correctly then problem can be described: match groups of characters with each other or with wildcard. If so, it can be solved using zip:
values = '123.456.789.10.11.12.abc'
wildcarded = '*.456.789.*.11.12.*'
for value, wildcard in zip(values.split('.'), wildcarded.split('.')):
if wildcard in (value, '*'):
continue
else:
# do something
# or using all()
all(wildcard in (value, '*') for value, wildcard in zip(values.split('.'), wildcarded.split('.')) # True if all matching If wildcards represent single character then split should be omitted and it will work same way (iterating over strings character by character).
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 23
Threads: 2
Joined: Apr 2022
Sorry, it doesn't match in the script, I know i it will match with a true or false but i guess this is more complicated thats why I am going to explain it some more, it is IRC related, the code is a 'blacklist' (or whitelist if you want), the point is:if a spambot is joining my channel on irc, the script should look in my blacklist.txt file where are multiple hosts are added, most hosts are added into the blacklist.txt with some (*) wildcards since not every ip of the spammer is static, so some entry's are added like: 123.456.*.10.*.abc etc. So, when a spam bot joins the channel, the script catches the full hostname of the spambot e.g.: 123.456.789.10.abc , in my blacklist is added: 123.456.*.10.*.abc (or other parts wildcarded, hope I am clear on this), so the goal is: a host is joining the channel, the script looks into the blacklist if there is a (wildcarded) match and take action. Untill now i have no luck with this, Im an old tcl'r and new to python, in tcl there was a oneliner for this: if {[string match ........., maybe this can help , thank you for the help, fozz
Posts: 23
Threads: 2
Joined: Apr 2022
May-01-2022, 03:56 AM
(This post was last modified: May-01-2022, 03:56 AM by fozz.)
Could this maybe a part of the solution?
import re
s = "tim email is [email protected]"
match = re.search('([\w.-]+)@([\w.-]+)', s)
if match:
print(match.group()) ## [email protected] (the whole match)
print(match.group(1)) ## tim (the username, group 1)
print(match.group(2)) ## somehost (the host, group 2)
Posts: 23
Threads: 2
Joined: Apr 2022
@bowlofred: Do i need to change every line (with the wildcarded hosts) in the .txt file like this like in your example?:
r'.*\.456\.789.*\.11\.12\..*'
Is that the only solution? I'm pulling my hair out on this :)
Posts: 1,583
Threads: 3
Joined: Mar 2020
That seems the easiest way to do it. Change all the dots to \. and change all the stars to .* . Then you can use it as a regex pattern.
Posts: 23
Threads: 2
Joined: Apr 2022
Thank you so much, the thing what I really don't understand is:
In the original situation i had added the hosts in the .txt file like this:
*!*@ 123.456.789.10.11.12.abc
And that was matching with:
if any(a in u for u in b): So There are wildcards in the line and a exclamation mark and the script is matching this host, why isn't this matching with a line added in the .txt file like:
*!*@ 123.456.789.10.11.12.*
I don't understand this.
|