Python Forum

Full Version: trying to recall a regex for re.split()
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
(May-18-2022, 06:17 PM)paul18fr Wrote: [ -> ]Provides the same result, isn't it ? but if you prefere re.split ...

on what basis should i make the choice between re.split() vs. re.search()?

reading the library reference it seems like if i use re.search(), i need to express just a pattern for what follows the number, which could be just about anything. in past help with module re the regex with re.split() was an expression of the whole thing, making it all hard to understand what was doing what. so if i needed something different i modified what i had or gave up. i don't have a clue how to start a regex from scratch to do these kinds of things, unless i am looking for specific things.
I am no expert like you guys, but I would forget the numbers and just concentrate on the letters.

measurements = ['2.5Hz', '2.5mHz', '2.5GHz', '2.5THz', '2.5mTHz', '2.5']

my_pattern = re.compile("h|H|m|G|T")
for m in measurements:
    if 'H' in m or 'h' in m:
        print('measured frequency is', m)
        start_pos = my_pattern.search(m).span()[0]
        print('number =', m[:start_pos], 'units =', m[start_pos:])
    else:
        continue
I draw my 20 years old regex for floating numbers
import re

def float_re():
    "Retourne une expression régulière qui matche les nombres flottants littéraux"
    return r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"

pat = re.compile(float_re())

def parse(s):
    mo = pat.match(s)
    if mo:
        return (float(mo.group(0)), s[mo.end():].strip())
    else:
        raise ValueError(
            'Expected string starting with a literal float, got', s)

if __name__ == '__main__':
    a = ['144mHz','432 mHz','1.296GHz','2.304 GHz']
    for s in a:
        print(repr(s), parse(s))
Output:
'144mHz' (144.0, 'mHz') '432 mHz' (432.0, 'mHz') '1.296GHz' (1.296, 'GHz') '2.304 GHz' (2.304, 'GHz')
(May-19-2022, 02:05 AM)Pedroski55 Wrote: [ -> ]I am no expert like you guys, but I would forget the numbers and just concentrate on the letters.
the "letters" can be anything that makes the number no longer a number. the regex "h|H|m|G|T" can't possibly be right. the units i showed are just randomly chosen real examples. change the units to "km/h" or "oz." or whatever.
(May-19-2022, 08:28 AM)Gribouillis Wrote: [ -> ]I draw my 20 years old regex for floating numbers
perhaps it can be sufficient to use that if you can get the whole number alone. with that number's len() you can slice the original string. but, i recall re.split() being convenient to get a 2-tuple result.
i think Gribouillis' code is what i want to use but i am unsure how that regex works.
Look at right side on regex101 have explanation.
Think i have link to Regex101 a couple time before when you have about Regex
Quote:on what basis should i make the choice between re.split() vs. re.search()?
Many of regex method work in a similar way,but have specific use case they work better.
as eg search,findall,match can all be used to split and and solve this case.
re.spilt is more specialized and as name say more about splitting and it handle groups different.
Gribouillis regex pattern also work with re.spilt if add a group to it ()
import re

def split_suffix(arg):
    return re.split(r"([+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)", arg)[1:]

if __name__ == '__main__':
    lst = [
        "144mHz",
        "1.296GHz",
        "100m/h",
        "1.25E-20watts",
        "1.23e+07GHz",
        "3.45e-4mHz",
        "12345abc",
    ]
    for n in lst:
        print(f'{n:<13} {split_suffix(n)}')
Output:
144mHz ['144', 'mHz'] 1.296GHz ['1.296', 'GHz'] 100m/h ['100', 'm/h'] 1.25E-20watts ['1.25E-20', 'watts'] 1.23e+07GHz ['1.23e+07', 'GHz'] 3.45e-4mHz ['3.45e-4', 'mHz'] 12345abc ['12345', 'abc']
(May-19-2022, 07:21 PM)Skaperen Wrote: [ -> ]i am unsure how that regex works.
Here is a visual explanation of the regex[attachment=1749]
Ah well, I presumed you would only be receiving frequency data as Hertz from some CIA outpost in Alaska!

Maybe the spooks are also sending recipe data to their mothers. oz, lbs

measurements = ['2.5hz','2.5Hz', '2.5mHz', '2.5GHz', '2.5THz', '2.5mTHz', '2.5', 'h', 'HZ', 'Hz' ]
my_pattern = re.compile("[a-z, A-Z]")
for m in measurements:    
    # if there is no text
    try:
        start_pos = my_pattern.search(m).span()[0]
        print('measured frequency is', m)
        start_pos = my_pattern.search(m).span()[0]
        print('number =', m[:start_pos], 'units =', m[start_pos:])
    except AttributeError:    
        continue
(May-19-2022, 07:58 PM)snippsat Wrote: [ -> ]Look at right side on regex101 have explanation.
it says at right side "giving back as needed". what does it mean by that? what is giving what to what? how is need determined? how can a regex disable that?
Pages: 1 2 3