Python Forum

Full Version: doing string split with 2 or more split characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
@snippsat

How about this, thought of it just now:

result = [r for c in mystring.split('|') for r in c.split('!')]
Output:
result ['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
(Aug-06-2023, 07:17 AM)Pedroski55 Wrote: [ -> ]How about this, thought of it just now:
Then you are back to original problem he try to avoid with many call to replace() if split on several character.
Now have to call new split() many time's if add new character to split on.
If you try with mystring under you will see the problem.
>>> mystring = 'ab|cd!ef?gh!ij*kl!mn|'
>>> ''.join(c if not c in '|!?*' else ' ' for c in s).split()
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']  
Also a tips in or other code try to avoid Never use for i in range(len(sequence)):
If i write my one liner over to a standard loop,see that there in no need for range(len(sequence).
Just loop over mystring and no need to manipulate the index.
def split_extend(mystring: str, chr_split: str ) -> list:
    result = ''
    for c in mystring:
        if c not in chr_split:
            result += c
        else:
            result += ' '
    return result.split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))
Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
The cooler way,but it can be harder to read,so line is in borderland of what should do in one line.
def split_extend(mystring: str, chr_split: str ) -> list:
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))
Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
my need for this to be a one liner is just for the work expression code. a module that is built-in to a recent version of Python3 is OK. installing a third-party module is not desired but not ruled out.

there may be non-alpha (non-word) characters that are between the splitter characters that need to be passed to the result. my usage of only alpha in the example was bad as it gave the incorrect impression that only alpha characters would be in the result strings.

i would expect to provide a set of characters that are splitters (or maybe a str of them). a scan of the str to be split could simply do thisch in splitset to know if a character is a splitter.
Using itertools groupby:

from itertools import groupby

text = 'ab|cd!ef?gh!ij*kl!mn|'
splitters = '|!?*'

splitted = (''.join(group) for case, group in groupby(text, lambda char: char not in splitters) if case) # generator, can be converted to list if required

print(*splitted)

-> ab cd ef gh ij kl mn
(Aug-06-2023, 05:55 PM)Skaperen Wrote: [ -> ]my need for this to be a one liner is just for the work expression code. a module that is built-in to a recent version of Python3 is OK. installing a third-party module is not desired but not ruled out.
The length should not matter at all as for the end user only see eg split_extend,if it one line or many dos not matter in the backend.
So eg as import.
from split_ex import split_extend

#mystring = 'ab|cd!ef?gh!ij*kl!mn|'
mystring = '12a345b789ared'
print(split_extend(mystring, 'ab'))
Output:
['12', '345', '789', 'red']
Or could to a own extend of the Python built-in,then it's just split_extend with no import.
λ python
Python 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = 'car1taxi2boat3'
>>> split_extend(s, '123')
['car', 'taxi', 'boat']
>>>
>>> help(split_extend)
Help on function split_extend in module sitecustomize:

split_extend(mystring: str, chr_split: str) -> list
    Split on characters given as parameter
Bye doing this i am using sitecustomize.py in root folder of Python.
my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character then do one split() call using that one special character. if there is a way to quickly do all that character mapping in one call, then i could do this with big-O timing not involving the diversity of splitter characters.
(Aug-09-2023, 06:39 PM)Skaperen Wrote: [ -> ]my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character then do one split() call using that one special character. if there is a way to quickly do all that character mapping in one call, then i could do this with big-O timing not involving the diversity of splitter characters.
Do you not read post or code that posted here,or dos it not do what you want?
Eg my code posted last if you wonder has a Big-O of O(n),so join and spilt functions is only called once,
no matter on how many new characters add to split on.
def split_extend(mystring: str, chr_split: str ) -> list:
    '''Split on characters given as parameter'''
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))
Output:
G:\div_code\egg λ python -m cProfile atest.py ['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn'] 29 function calls in 0.001 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.001 0.001 {built-in method builtins.exec} 1 0.000 0.000 0.001 0.001 atest.py:1(<module>) 1 0.000 0.000 0.000 0.000 atest.py:1(split_extend) 1 0.000 0.000 0.000 0.000 {built-in method builtins.print} 1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 22 0.000 0.000 0.000 0.000 atest.py:3(<genexpr>) 1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects
(Aug-09-2023, 06:39 PM)Skaperen Wrote: [ -> ]my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character
This strategy works very well
def splitter(chr_split: str):
    c, n = chr_split[-1], len(chr_split) - 1
    table = str.maketrans(chr_split[:-1], c * n)

    def split(astring: str):
        return astring.translate(table).split(c)
  
    return split

if __name__ == '__main__':
    split = splitter('|!?*')
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    print(split(mystring))
Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', '']
Note how it reveals a small bug in Snippsat's output.
code does not come out easy to read on my display due to the colors. if there is a setting to disable colorizing the code and just use white or yellow for all of it, that would help a lot. hopefully my next laptop won't have this issue. others i have seen do not. i can read the code but it is so hard to read, made worse by my old age eyes, that avoid certain things like code. the darker theme does help, but not so much for code. sorry.
(Aug-09-2023, 10:04 PM)Skaperen Wrote: [ -> ]hopefully my next laptop won't have this issue
I suggest you purchase a large monitor and connect it to your laptop.
Pages: 1 2 3