Feb-03-2022, 06:23 PM
maybe re can do this but regular expressions just make no sense in my mind. i need to split a string with the separator(s) being any combination of one or more non-alphanumeric characters.
(Feb-03-2022, 06:23 PM)Skaperen Wrote: [ -> ]maybe re can do this but regular expressions just make no sense in my mindRegex has been up in your Threads before,it's not so difficult understand if make an effort🤔
(Feb-03-2022, 06:23 PM)Skaperen Wrote: [ -> ]i need to split a string with the separator(s) being any combination of one or more non-alphanumeric characters.
>>> import re >>> >>> s = 'hello@world' >>> r = re.split('[^a-zA-Z0-9]', s) >>> r ['hello', 'world'] >>> >>> s = 'hello@"!world *^ car' >>> r = re.split('[^a-zA-Z0-9]', s) >>> r ['hello', '', '', 'world', '', '', '', 'car'] >>> [x for x in r if x] ['hello', 'world', 'car'] >>> >>> s = '123` green ?... color' >>> r = re.split('[^a-zA-Z0-9]', s) >>> ' '.join([x for x in r if x]) '123 green color'
chr(x)for x in range(1114112)
(or, at least, the printable ones). there would be a string of them. i'm thinking of a non-re way involving a loop and many calls to split()
. (only in my head, for now) but if re can do this, then i should do it that way.(Feb-06-2022, 01:01 AM)Skaperen Wrote: [ -> ]he characters could each be any character of chr(x)for x in range(1114112)It would split on any Unicode character or character's to.
>>> import re >>> >>> s = 'hello🤨world' >>> r = re.split('[^a-zA-Z0-9]', s) >>> r ['hello', 'world'] >>> >>> s = 'hello🤨world記者car' >>> r = re.split('[^a-zA-Z0-9]', s) >>> r ['hello', 'world', '', 'car']
Quote:i am totally clueless on that regex101 page. no idea how to start.You start simple eg
hello123world
,let say what to find 123 on regex 101.>>> import re >>> >>> s = 'hello123world' >>> re.findall(r'\d+', s) ['123']Test can test other methods of re module.
>>> import re >>> >>> s = 'hello123world' >>> re.split(r'\d+', s) ['hello', 'world'] >>> >>> # Make it a group >>> re.split(r'(\d+)', s) ['hello', '123', 'world'] >>> >>> re.search(r'(\d+)', s) <re.Match object; span=(5, 8), match='123'> >>> r = re.search(r'(\d+)', s) >>> r.group(1) '123'
Skaperen Wrote:i want to specify which characters to be split on.You can specify what ever you want.
>>> print([chr(x)for x in range(5000, 5010)]) ['ᎈ', 'ᎉ', 'ᎊ', 'ᎋ', 'ᎌ', 'ᎍ', 'ᎎ', 'ᎏ', '᎐', '᎑']
>>> import re >>> >>> s = 'hello🤨world@carᎈᎉbus' >>> r = re.split('[^a-zA-Z0-9ᎈᎉᎊᎋᎌᎍᎎᎏ᎐᎑]', s) >>> r ['hello', 'world', 'carᎈᎉbus']
^a-zA-Z0-9
part? does this code make sense? is there no way to make a regular expression use a str argument?def splitchars(pattern=None,chars=None): if not isinstance(pattern,str): raise TypeError('pattern (arg 1) is not a str') if not isinstance(chars,str): raise TypeError('chars (arg 2) is not a str') if not chars: return [pattern] e = chars.replace(r'[',r'\[').replace(r']',r'\]').replace(r'\',r'\\') return re.split(f'[{e}]',pattern)
(Feb-06-2022, 11:22 PM)Skaperen Wrote: [ -> ]i see you are customizing the regular expression for the specific range of characters. do i need that ^a-zA-Z0-9 part?Can put whatever you want in there,now only split on what's in the list.
a-z
matches a single character in the range between a
(index 97) and z
(index 122) (case sensitive).>>> import re >>> >>> s = 'bus and cab' >>> r = re.split(r'[tom🤨]', s) >>> r ['bus and cab'] >>> >>> s = 'bus and taxi' >>> r = re.split(r'[tom🤨]', s) >>> r ['bus and ', 'axi'] >>>
Quote:does this code make sense?Maybe if it dos what you want,regex could simplify your line 8,
Quote:is there no way to make a regular expression use a str argument?Of course that what regex takes in string(which is all Unicode in Python 3).
Doc Wrote:Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes).
However, Unicode strings and 8-bit strings cannot be mixed: that is,
you cannot match a Unicode string with a byte pattern or vice-versa; similarly,
when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.