doing string split with 2 or more split characters

Pedroski55 · Aug-06-2023, 07:17 AM

@snippsat

How about this, thought of it just now:

result = [r for c in mystring.split('|') for r in c.split('!')]

Output:result
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']

***snippsat*** · (This post was last modified: Aug-06-2023, 04:55 PM by snippsat.)

(Aug-06-2023, 07:17 AM)Pedroski55 Wrote: How about this, thought of it just now:

Then you are back to original problem he try to avoid with many call to replace() if split on several character.
Now have to call new split() many time's if add new character to split on.
If you try with mystring under you will see the problem.

>>> mystring = 'ab|cd!ef?gh!ij*kl!mn|'
>>> ''.join(c if not c in '|!?*' else ' ' for c in s).split()
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']

Also a tips in or other code try to avoid Never use for i in range(len(sequence)):
If i write my one liner over to a standard loop,see that there in no need for range(len(sequence).
Just loop over mystring and no need to manipulate the index.

def split_extend(mystring: str, chr_split: str ) -> list:
    result = ''
    for c in mystring:
        if c not in chr_split:
            result += c
        else:
            result += ' '
    return result.split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))

Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']

The cooler way,but it can be harder to read,so line is in borderland of what should do in one line.

def split_extend(mystring: str, chr_split: str ) -> list:
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))

Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']

Skaperen · Aug-06-2023, 05:55 PM

my need for this to be a one liner is just for the work expression code. a module that is built-in to a recent version of Python3 is OK. installing a third-party module is not desired but not ruled out.

there may be non-alpha (non-word) characters that are between the splitter characters that need to be passed to the result. my usage of only alpha in the example was bad as it gave the incorrect impression that only alpha characters would be in the result strings.

i would expect to provide a set of characters that are splitters (or maybe a str of them). a scan of the str to be split could simply do thisch in splitset to know if a character is a splitter.

**perfringo** · (This post was last modified: Aug-07-2023, 06:54 AM by perfringo.)

Using itertools groupby:

from itertools import groupby

text = 'ab|cd!ef?gh!ij*kl!mn|'
splitters = '|!?*'

splitted = (''.join(group) for case, group in groupby(text, lambda char: char not in splitters) if case) # generator, can be converted to list if required

print(*splitted)

-> ab cd ef gh ij kl mn

***snippsat*** · Aug-07-2023, 09:55 AM

(Aug-06-2023, 05:55 PM)Skaperen Wrote: my need for this to be a one liner is just for the work expression code. a module that is built-in to a recent version of Python3 is OK. installing a third-party module is not desired but not ruled out.

The length should not matter at all as for the end user only see eg split_extend,if it one line or many dos not matter in the backend.
So eg as import.

from split_ex import split_extend

#mystring = 'ab|cd!ef?gh!ij*kl!mn|'
mystring = '12a345b789ared'
print(split_extend(mystring, 'ab'))

Output:
['12', '345', '789', 'red']

Or could to a own extend of the Python built-in,then it's just split_extend with no import.

λ python
Python 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = 'car1taxi2boat3'
>>> split_extend(s, '123')
['car', 'taxi', 'boat']
>>>
>>> help(split_extend)
Help on function split_extend in module sitecustomize:

split_extend(mystring: str, chr_split: str) -> list
    Split on characters given as parameter

Bye doing this i am using sitecustomize.py in root folder of Python.

Hide/Show

import builtins

def split_extend(mystring: str, chr_split: str ) -> list:
    '''Split on characters given as parameter'''
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

builtins.split_extend = split_extend

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))

Skaperen · Aug-09-2023, 06:39 PM

my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character then do one split() call using that one special character. if there is a way to quickly do all that character mapping in one call, then i could do this with big-O timing not involving the diversity of splitter characters.

***snippsat*** · (This post was last modified: Aug-09-2023, 07:29 PM by snippsat.)

(Aug-09-2023, 06:39 PM)Skaperen Wrote: my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character then do one split() call using that one special character. if there is a way to quickly do all that character mapping in one call, then i could do this with big-O timing not involving the diversity of splitter characters.

Do you not read post or code that posted here,or dos it not do what you want?
Eg my code posted last if you wonder has a Big-O of O(n),so join and spilt functions is only called once,
no matter on how many new characters add to split on.

def split_extend(mystring: str, chr_split: str ) -> list:
    '''Split on characters given as parameter'''
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))

Output:G:\div_code\egg
λ python -m cProfile atest.py
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
         29 function calls in 0.001 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.001    0.001 atest.py:1(<module>)
        1    0.000    0.000    0.000    0.000 atest.py:1(split_extend)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       22    0.000    0.000    0.000    0.000 atest.py:3(<genexpr>)
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects

**Gribouillis** · (This post was last modified: Aug-10-2023, 09:42 AM by Gribouillis.)

(Aug-09-2023, 06:39 PM)Skaperen Wrote: my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character

This strategy works very well

def splitter(chr_split: str):
    c, n = chr_split[-1], len(chr_split) - 1
    table = str.maketrans(chr_split[:-1], c * n)

    def split(astring: str):
        return astring.translate(table).split(c)
  
    return split

if __name__ == '__main__':
    split = splitter('|!?*')
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    print(split(mystring))

Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', '']

Note how it reveals a small bug in Snippsat's output.

Skaperen · Aug-09-2023, 10:04 PM

code does not come out easy to read on my display due to the colors. if there is a setting to disable colorizing the code and just use white or yellow for all of it, that would help a lot. hopefully my next laptop won't have this issue. others i have seen do not. i can read the code but it is so hard to read, made worse by my old age eyes, that avoid certain things like code. the darker theme does help, but not so much for code. sorry.

**Gribouillis** · Aug-10-2023, 09:20 AM

(Aug-09-2023, 10:04 PM)Skaperen Wrote: hopefully my next laptop won't have this issue

I suggest you purchase a large monitor and connect it to your laptop.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[split] How to continue code after .show() in matplotlib?	pythonnewbie62	1	98	10 hours ago Last Post: deanhystad
	[split] ibm_db install problem	SQLPython	1	696	Feb-13-2025, 07:24 PM Last Post: buran
	[split] Newbie needs help	Schoe1	0	443	Feb-12-2025, 06:57 PM Last Post: Schoe1
	how to split pdf under 10mb using python	skchui9786	4	1,120	Jan-18-2025, 03:25 AM Last Post: skchui9786
	[split] another problem with code	blakeusherremix68	0	419	Dec-23-2024, 11:36 PM Last Post: blakeusherremix68
	[split] Code help	emma1423	1	624	Dec-13-2024, 02:00 PM Last Post: perfringo
	[split] Prime numbers	saima	1	560	Dec-09-2024, 02:19 AM Last Post: jefsummers
	[split] How to ask Smart Questions (thread title expansion)	darkuser	4	1,508	Nov-11-2024, 01:27 PM Last Post: deanhystad
	[split] Help with my coding	happy_nutella	1	699	Oct-08-2024, 06:52 PM Last Post: jefsummers
	Unable to understand the function string.split()	Hudjefa	8	2,570	Sep-16-2024, 04:25 AM Last Post: Pedroski55

doing string split with 2 or more split characters

User Panel Messages

Announcements