trying to recall a regex for re.split() - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: trying to recall a regex for re.split() (/thread-37251.html) |
trying to recall a regex for re.split() - Skaperen - May-18-2022 i can't find where this was and can't recall it. regex stuff still doesn't sink in. what i am wanting to do is split with all leading decimal digits going to the 1st result and the first non-decimal and everything after it going to the 2nd result. what i want to do is convert a number in a string where the number has a units designation after it, such as '144mHz' giving me ['144', 'mHz']. i think i need to include '.' for float cases in with the digits. people tell me regex is easy but i never "get it". i think it is because i've never seen an explanation with any example. they just show the example and show the result and expect everyone to understand how it worked. RE: trying to recall a regex for re.split() - paul18fr - May-18-2022 The following seems to work; I've tried to be exhaustive (but a simplier way should exist) data = '144mhz' Number, Units = re.search(r"([+\-]?\d+\.\d+[eE]?[+\-]?[\d]?[\d]?|[+\-]?\d+[eE]?[+\-]?[\d]?[\d]?)[\s+]?([a-z]+)", data.lower()).groups()With :
RE: trying to recall a regex for re.split() - snippsat - May-18-2022 >>> import re >>> >>> n = '144mHz' >>> re.split(r'(\d+\.?\d+)', n)[1:] ['144', 'mHz'] >>> >>> n = '25.99mHz' >>> re.split(r'(\d+\.?\d+)', n)[1:] ['25.99', 'mHz'] >>> >>> n = '19999.9mHz' >>> re.split(r'(\d+\.?\d+)', n)[1:] ['19999.9', 'mHz'] RE: trying to recall a regex for re.split() - Skaperen - May-18-2022 (May-18-2022, 12:24 PM)paul18fr Wrote: [eE]?[+\-]?[\d]?[\d]? => for scientific notation with or without the sign, and with 2 digits max herehow does it do 2 digits max? what if i want 3? what if i want no limit? RE: trying to recall a regex for re.split() - ndc85430 - May-18-2022 \d matches a single digit, so \d\d matches exactly 2. If you wanted exactly 3, you could write \d{3} , for example. The documentation tells you the syntax, so you should go there to see what things are possible.Then, there are regular expression testers, e.g. https://pythex.org/ where you can try them out.
RE: trying to recall a regex for re.split() - Skaperen - May-18-2022 (May-18-2022, 12:24 PM)paul18fr Wrote: remember that parentheses indicate what you want to recoverwhen do i use parenthesis if i am doing re.split() ? why are you using re.search() ?if i don't use re , then the way i would do this is a loop through each character and trying it, alone, in int(ch,10) or .isdigit() or .isdecimal() , then splicing up to, and from, that position where the loop breaks.
RE: trying to recall a regex for re.split() - Skaperen - May-18-2022 like this: def numunits(s=None): if not isinstance(s,str): raise TypeError('string expected') i = iter(range(len(s))) for p in i: try: v = float(s[:p]) break except: continue else: raise ValueError('no number') for p in i: try: v = float(s[:p]) continue except: break else: raise ValueError('no units') if s[p] != ' ': p -= 1 return v,s[p:] if __name__ == '__main__': a = ['144mHz','432 mHz','1.296GHz','2.304 GHz'] for x in a: print(repr(x)) print(repr(numunits(x)))
RE: trying to recall a regex for re.split() - paul18fr - May-18-2022 Quote:when do i use parenthesis In the folowing example, the code [\s+]?[a-z]+?\s+ is not between parenthesis, then no string won't be recovered data2 = '144 XXX mhz' AAA = re.search(r"([+\-]?\d+\.\d+[eE]?[+\-]?[\d]?[\d]?|[+\-]?\d+[eE]?[+\-]?[\d]?[\d]?)[\s+]?[a-z]+?\s+([a-z]+)", data2.lower()).groups()if you want to get "XXX" as well, add a parenthesis between [a-z]+? AAA2 = re.search(r"([+\-]?\d+\.\d+[eE]?[+\-]?[\d]?[\d]?|[+\-]?\d+[eE]?[+\-]?[\d]?[\d]?)[\s+]?([a-z]+?)\s+([a-z]+)", data2.lower()).groups() Quote:why are you using re.search()?Provides the same result, isn't it ? but if you prefere re.split ... RE: trying to recall a regex for re.split() - Gribouillis - May-18-2022 If you accept only the syntax of Python numbers, as in the previous post, you could use tokenize >>> import io >>> from tokenize import tokenize >>> def parse(s): ... t = tokenize(io.BytesIO(s.encode()).readline) ... next(t) ... x, u = next(t), next(t) ... return (x.string, u.string) ... >>> for a in ['144mHz','432 mHz','1.296GHz','2.304 GHz']: ... print(repr(a), parse(a)) ... '144mHz' ('144', 'mHz') '432 mHz' ('432', 'mHz') '1.296GHz' ('1.296', 'GHz') '2.304 GHz' ('2.304', 'GHz')But you cannot extend the syntax to allow fancy numbers representation. By the way, in order to specify the problem clearly, it would be good to write a complete syntax of the strings that you want to be able to parse. RE: trying to recall a regex for re.split() - Skaperen - May-18-2022 (May-18-2022, 06:29 PM)Gribouillis Wrote: By the way, in order to specify the problem clearly, it would be good to write a complete syntax of the strings that you want to be able to parse.i'm trying to generalize this to make a function. the first part is a decimal number, although i may, someday, try to extend that to hexadecimal (including float). the 2nd part is any string of characters that could be taken as a unit suffix like 'km' or 'Hz'. the intended function splits it into a converted value and a string or raises an exception if something is bad. i hadn't thought about scientific notation but i should do that just in case someone gives '1.25E-20watts'. right now, it's about making that function. then i will be making a few app scripts that take these from command line arguments, using that function. i wasn't thinking of this as "parsing" although i can understand that it is, even if just a small amount (kind of like str.split() is). |