Posts: 4,653
Threads: 1,496
Joined: Sep 2016
May-18-2022, 10:53 PM
(This post was last modified: May-18-2022, 10:54 PM by Skaperen.)
(May-18-2022, 06:17 PM)paul18fr Wrote: Provides the same result, isn't it ? but if you prefere re.split ...
on what basis should i make the choice between re.split() vs. re.search()?
reading the library reference it seems like if i use re.search(), i need to express just a pattern for what follows the number, which could be just about anything. in past help with module re the regex with re.split() was an expression of the whole thing, making it all hard to understand what was doing what. so if i needed something different i modified what i had or gave up. i don't have a clue how to start a regex from scratch to do these kinds of things, unless i am looking for specific things.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 1,094
Threads: 143
Joined: Jul 2017
I am no expert like you guys, but I would forget the numbers and just concentrate on the letters.
measurements = ['2.5Hz', '2.5mHz', '2.5GHz', '2.5THz', '2.5mTHz', '2.5']
my_pattern = re.compile("h|H|m|G|T")
for m in measurements:
if 'H' in m or 'h' in m:
print('measured frequency is', m)
start_pos = my_pattern.search(m).span()[0]
print('number =', m[:start_pos], 'units =', m[start_pos:])
else:
continue
Posts: 4,801
Threads: 77
Joined: Jan 2018
May-19-2022, 08:28 AM
(This post was last modified: May-19-2022, 08:30 AM by Gribouillis.)
I draw my 20 years old regex for floating numbers
import re
def float_re():
"Retourne une expression régulière qui matche les nombres flottants littéraux"
return r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
pat = re.compile(float_re())
def parse(s):
mo = pat.match(s)
if mo:
return (float(mo.group(0)), s[mo.end():].strip())
else:
raise ValueError(
'Expected string starting with a literal float, got', s)
if __name__ == '__main__':
a = ['144mHz','432 mHz','1.296GHz','2.304 GHz']
for s in a:
print(repr(s), parse(s)) Output: '144mHz' (144.0, 'mHz')
'432 mHz' (432.0, 'mHz')
'1.296GHz' (1.296, 'GHz')
'2.304 GHz' (2.304, 'GHz')
Posts: 4,653
Threads: 1,496
Joined: Sep 2016
(May-19-2022, 02:05 AM)Pedroski55 Wrote: I am no expert like you guys, but I would forget the numbers and just concentrate on the letters. the "letters" can be anything that makes the number no longer a number. the regex "h|H|m|G|T" can't possibly be right. the units i showed are just randomly chosen real examples. change the units to "km/h" or "oz." or whatever.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,653
Threads: 1,496
Joined: Sep 2016
(May-19-2022, 08:28 AM)Gribouillis Wrote: I draw my 20 years old regex for floating numbers perhaps it can be sufficient to use that if you can get the whole number alone. with that number's len() you can slice the original string. but, i recall re.split() being convenient to get a 2-tuple result.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,653
Threads: 1,496
Joined: Sep 2016
i think Gribouillis' code is what i want to use but i am unsure how that regex works.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 7,324
Threads: 123
Joined: Sep 2016
May-19-2022, 07:58 PM
(This post was last modified: May-19-2022, 07:58 PM by snippsat.)
Look at right side on regex101 have explanation.
Think i have link to Regex101 a couple time before when you have about Regex
Quote:on what basis should i make the choice between re.split() vs. re.search()?
Many of regex method work in a similar way,but have specific use case they work better.
as eg search,findall,match can all be used to split and and solve this case.
re.spilt is more specialized and as name say more about splitting and it handle groups different.
Gribouillis regex pattern also work with re.spilt if add a group to it ()
import re
def split_suffix(arg):
return re.split(r"([+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)", arg)[1:]
if __name__ == '__main__':
lst = [
"144mHz",
"1.296GHz",
"100m/h",
"1.25E-20watts",
"1.23e+07GHz",
"3.45e-4mHz",
"12345abc",
]
for n in lst:
print(f'{n:<13} {split_suffix(n)}') Output: 144mHz ['144', 'mHz']
1.296GHz ['1.296', 'GHz']
100m/h ['100', 'm/h']
1.25E-20watts ['1.25E-20', 'watts']
1.23e+07GHz ['1.23e+07', 'GHz']
3.45e-4mHz ['3.45e-4', 'mHz']
12345abc ['12345', 'abc']
Posts: 4,801
Threads: 77
Joined: Jan 2018
(May-19-2022, 07:21 PM)Skaperen Wrote: i am unsure how that regex works. Here is a visual explanation of the regex
Posts: 1,094
Threads: 143
Joined: Jul 2017
May-19-2022, 10:05 PM
(This post was last modified: May-19-2022, 10:05 PM by Pedroski55.)
Ah well, I presumed you would only be receiving frequency data as Hertz from some CIA outpost in Alaska!
Maybe the spooks are also sending recipe data to their mothers. oz, lbs
measurements = ['2.5hz','2.5Hz', '2.5mHz', '2.5GHz', '2.5THz', '2.5mTHz', '2.5', 'h', 'HZ', 'Hz' ]
my_pattern = re.compile("[a-z, A-Z]")
for m in measurements:
# if there is no text
try:
start_pos = my_pattern.search(m).span()[0]
print('measured frequency is', m)
start_pos = my_pattern.search(m).span()[0]
print('number =', m[:start_pos], 'units =', m[start_pos:])
except AttributeError:
continue
Posts: 4,653
Threads: 1,496
Joined: Sep 2016
(May-19-2022, 07:58 PM)snippsat Wrote: Look at right side on regex101 have explanation. it says at right side "giving back as needed". what does it mean by that? what is giving what to what? how is need determined? how can a regex disable that?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
|