Python Forum
trying to recall a regex for re.split()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
trying to recall a regex for re.split()
#11
(May-18-2022, 06:17 PM)paul18fr Wrote: Provides the same result, isn't it ? but if you prefere re.split ...

on what basis should i make the choice between re.split() vs. re.search()?

reading the library reference it seems like if i use re.search(), i need to express just a pattern for what follows the number, which could be just about anything. in past help with module re the regex with re.split() was an expression of the whole thing, making it all hard to understand what was doing what. so if i needed something different i modified what i had or gave up. i don't have a clue how to start a regex from scratch to do these kinds of things, unless i am looking for specific things.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#12
I am no expert like you guys, but I would forget the numbers and just concentrate on the letters.

measurements = ['2.5Hz', '2.5mHz', '2.5GHz', '2.5THz', '2.5mTHz', '2.5']

my_pattern = re.compile("h|H|m|G|T")
for m in measurements:
    if 'H' in m or 'h' in m:
        print('measured frequency is', m)
        start_pos = my_pattern.search(m).span()[0]
        print('number =', m[:start_pos], 'units =', m[start_pos:])
    else:
        continue
Reply
#13
I draw my 20 years old regex for floating numbers
import re

def float_re():
    "Retourne une expression régulière qui matche les nombres flottants littéraux"
    return r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"

pat = re.compile(float_re())

def parse(s):
    mo = pat.match(s)
    if mo:
        return (float(mo.group(0)), s[mo.end():].strip())
    else:
        raise ValueError(
            'Expected string starting with a literal float, got', s)

if __name__ == '__main__':
    a = ['144mHz','432 mHz','1.296GHz','2.304 GHz']
    for s in a:
        print(repr(s), parse(s))
Output:
'144mHz' (144.0, 'mHz') '432 mHz' (432.0, 'mHz') '1.296GHz' (1.296, 'GHz') '2.304 GHz' (2.304, 'GHz')
Reply
#14
(May-19-2022, 02:05 AM)Pedroski55 Wrote: I am no expert like you guys, but I would forget the numbers and just concentrate on the letters.
the "letters" can be anything that makes the number no longer a number. the regex "h|H|m|G|T" can't possibly be right. the units i showed are just randomly chosen real examples. change the units to "km/h" or "oz." or whatever.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#15
(May-19-2022, 08:28 AM)Gribouillis Wrote: I draw my 20 years old regex for floating numbers
perhaps it can be sufficient to use that if you can get the whole number alone. with that number's len() you can slice the original string. but, i recall re.split() being convenient to get a 2-tuple result.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#16
i think Gribouillis' code is what i want to use but i am unsure how that regex works.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#17
Look at right side on regex101 have explanation.
Think i have link to Regex101 a couple time before when you have about Regex
Quote:on what basis should i make the choice between re.split() vs. re.search()?
Many of regex method work in a similar way,but have specific use case they work better.
as eg search,findall,match can all be used to split and and solve this case.
re.spilt is more specialized and as name say more about splitting and it handle groups different.
Gribouillis regex pattern also work with re.spilt if add a group to it ()
import re

def split_suffix(arg):
    return re.split(r"([+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)", arg)[1:]

if __name__ == '__main__':
    lst = [
        "144mHz",
        "1.296GHz",
        "100m/h",
        "1.25E-20watts",
        "1.23e+07GHz",
        "3.45e-4mHz",
        "12345abc",
    ]
    for n in lst:
        print(f'{n:<13} {split_suffix(n)}')
Output:
144mHz ['144', 'mHz'] 1.296GHz ['1.296', 'GHz'] 100m/h ['100', 'm/h'] 1.25E-20watts ['1.25E-20', 'watts'] 1.23e+07GHz ['1.23e+07', 'GHz'] 3.45e-4mHz ['3.45e-4', 'mHz'] 12345abc ['12345', 'abc']
Reply
#18
(May-19-2022, 07:21 PM)Skaperen Wrote: i am unsure how that regex works.
Here is a visual explanation of the regex    
Reply
#19
Ah well, I presumed you would only be receiving frequency data as Hertz from some CIA outpost in Alaska!

Maybe the spooks are also sending recipe data to their mothers. oz, lbs

measurements = ['2.5hz','2.5Hz', '2.5mHz', '2.5GHz', '2.5THz', '2.5mTHz', '2.5', 'h', 'HZ', 'Hz' ]
my_pattern = re.compile("[a-z, A-Z]")
for m in measurements:    
    # if there is no text
    try:
        start_pos = my_pattern.search(m).span()[0]
        print('measured frequency is', m)
        start_pos = my_pattern.search(m).span()[0]
        print('number =', m[:start_pos], 'units =', m[start_pos:])
    except AttributeError:    
        continue
Reply
#20
(May-19-2022, 07:58 PM)snippsat Wrote: Look at right side on regex101 have explanation.
it says at right side "giving back as needed". what does it mean by that? what is giving what to what? how is need determined? how can a regex disable that?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Split pdf in pypdf based upon file regex standenman 1 2,100 Feb-03-2023, 12:01 PM
Last Post: SpongeB0B
  recall cool_person 1 1,044 May-07-2022, 08:04 AM
Last Post: menator01

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020