Python Forum
Split string with multiple delimiters and keep the string in "groups"
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Split string with multiple delimiters and keep the string in "groups"
#1
Putting a title on this was hard so hopefully I can do a better job of explaining it here.

Let's say I had a string like this: .nnddd9999999999ddnn. (A) or like this: ''0000aaa0000'' (B)

I would like to be able to split those strings with multiple delimiters, but keep the strings "intact" like this: string A would become:
['.', 'nn', 'ddd', '9999999999', 'ddd', 'nn', '.']
string B would become:
['\'\'', '0000', 'aaa', '0000', '\'\'']

As you can see, each string is split with multiple delimiters, but the "chains of characters" are intact rather than it appearing like:
['.', 'n', 'n', 'd', 'd', 'd', '9', '9', '9'...]

The closest I have gotten is using re.split with a delimiter like (.|n|d|9) but that produces an array like:
['.', '', 'n', '', 'n', '', 'd', ''...]

How could I get this to work? Is using re.split even the best way?
Reply
#2
>>> import re
>>> regex = re.compile(r'((.)\2*)')
>>>
>>> matches = regex.finditer(".nnddd9999999999ddnn.")
>>> s = []
>>> for match in matches:
...     s.append(match.group(0))
...
>>> print(s)
['.', 'nn', 'ddd', '9999999999', 'dd', 'nn', '.']
Reply
#3
Is there really something separating each character, or are you just looking for any repeated character? (You use the term "delimiter", but I don't see any).

If you really just want repeated characters, you could do the following. The grouping is a bit odd, but you can pull the repeated strings out of the first part of each tuple.

>>> s = ".nnddd9999999999ddnn."
>>> re.findall(r"((.)\2+)", s)
[('nn', 'n'), ('ddd', 'd'), ('9999999999', '9'), ('dd', 'd'), ('nn', 'n')]
Or if you really just have a few characters and you want all the strings of them, you could do what you did earlier, but add the repetition (+) operator to them:

>>> re.findall(r"(\.+|n+|d+|9+)",s)
['.', 'nn', 'ddd', '9999999999', 'dd', 'nn', '.']
Reply
#4
(May-11-2020, 02:58 PM)bowlofred Wrote: Is there really something separating each character, or are you just looking for any repeated character? (You use the term "delimiter", but I don't see any).
Repeated characters is correct - I used the term 'delimiter' because the way I was imagining it was each unique character acts as it's own delimiter, because it separates the other characters, if that makes any sense.

(May-11-2020, 02:58 PM)bowlofred Wrote:
>>> re.findall(r"(\.+|n+|d+|9+)",s)
['.', 'nn', 'ddd', '9999999999', 'dd', 'nn', '.']
Thanks for the solution - I prefer this as it's pretty easy to understand whats going on.
Reply
#5
You can do it in a complete different way with more_itertools.split_when, which works also with other types as str.

The generator function split_when takes an iterable (the str e.g.) and a predicate function.
The generator calls the predicate function with last_elent and current_element.


import more_itertools


def predicate(last, current):
    return last != current


for unique in more_itertools.split_when(".nnddd9999999999ddnn.", predicate):
    print("".join(unique))
    # all elements in unique, are separated
    # the join converts it back to a str.
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Find string between two substrings, in a stream of data xbit 1 152 May-09-2021, 03:32 PM
Last Post: bowlofred
  TypeError: int() argument must be a string, a bytes-like object or a number, not 'Non Anldra12 2 182 May-02-2021, 03:45 PM
Last Post: Anldra12
  help with pytesseract.image_to_string(savedImage, config='--psm 11')iamge to string korenron 0 112 Apr-29-2021, 10:08 AM
Last Post: korenron
Question convert unlabeled list of tuples to json (string) masterAndreas 4 226 Apr-27-2021, 10:35 AM
Last Post: masterAndreas
Question Sublist/ Subarray into string Python SantiagoPB 2 157 Apr-23-2021, 07:03 PM
Last Post: SantiagoPB
  AttributeError: module 'string' has no attribute 'uppercase' Anldra12 10 448 Apr-23-2021, 05:30 PM
Last Post: ibreeden
  How to extract specific key value pair from string? aditi06 0 246 Apr-15-2021, 06:26 PM
Last Post: aditi06
  Creating new column with a input string drunkenneo 2 235 Apr-14-2021, 08:10 AM
Last Post: drunkenneo
  TypeError: __str__ returned non-string (type tuple) Anldra12 1 363 Apr-13-2021, 07:50 AM
Last Post: Anldra12
  Parse String between 2 Delimiters and add as single list items lastyle 5 273 Apr-11-2021, 11:03 PM
Last Post: lastyle

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020