Split string with multiple delimiters and keep the string in "groups" - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Split string with multiple delimiters and keep the string in "groups" (/thread-26717.html) |
Split string with multiple delimiters and keep the string in "groups" - DreamingInsanity - May-11-2020 Putting a title on this was hard so hopefully I can do a better job of explaining it here. Let's say I had a string like this: .nnddd9999999999ddnn. (A) or like this: ''0000aaa0000'' (B)I would like to be able to split those strings with multiple delimiters, but keep the strings "intact" like this: string A would become: ['.', 'nn', 'ddd', '9999999999', 'ddd', 'nn', '.'] string B would become: ['\'\'', '0000', 'aaa', '0000', '\'\''] As you can see, each string is split with multiple delimiters, but the "chains of characters" are intact rather than it appearing like: ['.', 'n', 'n', 'd', 'd', 'd', '9', '9', '9'...] The closest I have gotten is using re.split with a delimiter like (.|n|d|9) but that produces an array like: ['.', '', 'n', '', 'n', '', 'd', ''...] How could I get this to work? Is using re.split even the best way? RE: Split string with multiple delimiters and keep the string in "groups" - anbu23 - May-11-2020 >>> import re >>> regex = re.compile(r'((.)\2*)') >>> >>> matches = regex.finditer(".nnddd9999999999ddnn.") >>> s = [] >>> for match in matches: ... s.append(match.group(0)) ... >>> print(s) ['.', 'nn', 'ddd', '9999999999', 'dd', 'nn', '.'] RE: Split string with multiple delimiters and keep the string in "groups" - bowlofred - May-11-2020 Is there really something separating each character, or are you just looking for any repeated character? (You use the term "delimiter", but I don't see any). If you really just want repeated characters, you could do the following. The grouping is a bit odd, but you can pull the repeated strings out of the first part of each tuple. >>> s = ".nnddd9999999999ddnn." >>> re.findall(r"((.)\2+)", s) [('nn', 'n'), ('ddd', 'd'), ('9999999999', '9'), ('dd', 'd'), ('nn', 'n')]Or if you really just have a few characters and you want all the strings of them, you could do what you did earlier, but add the repetition (+) operator to them: >>> re.findall(r"(\.+|n+|d+|9+)",s) ['.', 'nn', 'ddd', '9999999999', 'dd', 'nn', '.'] RE: Split string with multiple delimiters and keep the string in "groups" - DreamingInsanity - May-12-2020 (May-11-2020, 02:58 PM)bowlofred Wrote: Is there really something separating each character, or are you just looking for any repeated character? (You use the term "delimiter", but I don't see any).Repeated characters is correct - I used the term 'delimiter' because the way I was imagining it was each unique character acts as it's own delimiter, because it separates the other characters, if that makes any sense. (May-11-2020, 02:58 PM)bowlofred Wrote:Thanks for the solution - I prefer this as it's pretty easy to understand whats going on.>>> re.findall(r"(\.+|n+|d+|9+)",s) ['.', 'nn', 'ddd', '9999999999', 'dd', 'nn', '.'] RE: Split string with multiple delimiters and keep the string in "groups" - DeaD_EyE - May-12-2020 You can do it in a complete different way with more_itertools.split_when , which works also with other types as str .The generator function split_when takes an iterable (the str e.g.) and a predicate function. The generator calls the predicate function with last_elent and current_element. import more_itertools def predicate(last, current): return last != current for unique in more_itertools.split_when(".nnddd9999999999ddnn.", predicate): print("".join(unique)) # all elements in unique, are separated # the join converts it back to a str. |