Python Forum
doing string split with 2 or more split characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
doing string split with 2 or more split characters
#11
@snippsat

How about this, thought of it just now:

result = [r for c in mystring.split('|') for r in c.split('!')]
Output:
result ['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
Reply
#12
(Aug-06-2023, 07:17 AM)Pedroski55 Wrote: How about this, thought of it just now:
Then you are back to original problem he try to avoid with many call to replace() if split on several character.
Now have to call new split() many time's if add new character to split on.
If you try with mystring under you will see the problem.
>>> mystring = 'ab|cd!ef?gh!ij*kl!mn|'
>>> ''.join(c if not c in '|!?*' else ' ' for c in s).split()
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']  
Also a tips in or other code try to avoid Never use for i in range(len(sequence)):
If i write my one liner over to a standard loop,see that there in no need for range(len(sequence).
Just loop over mystring and no need to manipulate the index.
def split_extend(mystring: str, chr_split: str ) -> list:
    result = ''
    for c in mystring:
        if c not in chr_split:
            result += c
        else:
            result += ' '
    return result.split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))
Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
The cooler way,but it can be harder to read,so line is in borderland of what should do in one line.
def split_extend(mystring: str, chr_split: str ) -> list:
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))
Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn']
Reply
#13
my need for this to be a one liner is just for the work expression code. a module that is built-in to a recent version of Python3 is OK. installing a third-party module is not desired but not ruled out.

there may be non-alpha (non-word) characters that are between the splitter characters that need to be passed to the result. my usage of only alpha in the example was bad as it gave the incorrect impression that only alpha characters would be in the result strings.

i would expect to provide a set of characters that are splitters (or maybe a str of them). a scan of the str to be split could simply do thisch in splitset to know if a character is a splitter.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#14
Using itertools groupby:

from itertools import groupby

text = 'ab|cd!ef?gh!ij*kl!mn|'
splitters = '|!?*'

splitted = (''.join(group) for case, group in groupby(text, lambda char: char not in splitters) if case) # generator, can be converted to list if required

print(*splitted)

-> ab cd ef gh ij kl mn
Gribouillis and Skaperen like this post
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#15
(Aug-06-2023, 05:55 PM)Skaperen Wrote: my need for this to be a one liner is just for the work expression code. a module that is built-in to a recent version of Python3 is OK. installing a third-party module is not desired but not ruled out.
The length should not matter at all as for the end user only see eg split_extend,if it one line or many dos not matter in the backend.
So eg as import.
from split_ex import split_extend

#mystring = 'ab|cd!ef?gh!ij*kl!mn|'
mystring = '12a345b789ared'
print(split_extend(mystring, 'ab'))
Output:
['12', '345', '789', 'red']
Or could to a own extend of the Python built-in,then it's just split_extend with no import.
λ python
Python 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = 'car1taxi2boat3'
>>> split_extend(s, '123')
['car', 'taxi', 'boat']
>>>
>>> help(split_extend)
Help on function split_extend in module sitecustomize:

split_extend(mystring: str, chr_split: str) -> list
    Split on characters given as parameter
Bye doing this i am using sitecustomize.py in root folder of Python.
Gribouillis likes this post
Reply
#16
my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character then do one split() call using that one special character. if there is a way to quickly do all that character mapping in one call, then i could do this with big-O timing not involving the diversity of splitter characters.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#17
(Aug-09-2023, 06:39 PM)Skaperen Wrote: my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character then do one split() call using that one special character. if there is a way to quickly do all that character mapping in one call, then i could do this with big-O timing not involving the diversity of splitter characters.
Do you not read post or code that posted here,or dos it not do what you want?
Eg my code posted last if you wonder has a Big-O of O(n),so join and spilt functions is only called once,
no matter on how many new characters add to split on.
def split_extend(mystring: str, chr_split: str ) -> list:
    '''Split on characters given as parameter'''
    return ''.join(c if not c in chr_split else ' ' for c in mystring).split()

if __name__ == '__main__':
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    chr_split = '|!?*'
    print(split_extend(mystring, chr_split))
Output:
G:\div_code\egg λ python -m cProfile atest.py ['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn'] 29 function calls in 0.001 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.001 0.001 {built-in method builtins.exec} 1 0.000 0.000 0.001 0.001 atest.py:1(<module>) 1 0.000 0.000 0.000 0.000 atest.py:1(split_extend) 1 0.000 0.000 0.000 0.000 {built-in method builtins.print} 1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 22 0.000 0.000 0.000 0.000 atest.py:3(<genexpr>) 1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects
Reply
#18
(Aug-09-2023, 06:39 PM)Skaperen Wrote: my original though which i have not tested was to replace all the splitter characters (this is likely to need too many calls) with the same one special character
This strategy works very well
def splitter(chr_split: str):
    c, n = chr_split[-1], len(chr_split) - 1
    table = str.maketrans(chr_split[:-1], c * n)

    def split(astring: str):
        return astring.translate(table).split(c)
  
    return split

if __name__ == '__main__':
    split = splitter('|!?*')
    mystring = 'ab|cd!ef?gh!ij*kl!mn|'
    print(split(mystring))
Output:
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', '']
Note how it reveals a small bug in Snippsat's output.
Reply
#19
code does not come out easy to read on my display due to the colors. if there is a setting to disable colorizing the code and just use white or yellow for all of it, that would help a lot. hopefully my next laptop won't have this issue. others i have seen do not. i can read the code but it is so hard to read, made worse by my old age eyes, that avoid certain things like code. the darker theme does help, but not so much for code. sorry.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#20
(Aug-09-2023, 10:04 PM)Skaperen Wrote: hopefully my next laptop won't have this issue
I suggest you purchase a large monitor and connect it to your laptop.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  using split in my flask wtf form robertkwild 1 179 Jun-11-2024, 05:19 PM
Last Post: deanhystad
  [split] Class and methods ebn852_pan 15 1,031 May-23-2024, 11:57 PM
Last Post: ebn852_pan
  Class test : good way to split methods into several files paul18fr 4 674 Jan-30-2024, 11:46 AM
Last Post: Pedroski55
  [split] Pipenv mohammadasadi4 0 393 Jan-15-2024, 10:35 AM
Last Post: mohammadasadi4
  [split] Why is there an output of None akbarza 1 581 Nov-27-2023, 02:53 PM
Last Post: deanhystad
  [split] Class takes no arguments bily071 2 775 Oct-23-2023, 03:59 PM
Last Post: deanhystad
  [split] Issue installing selenium Akshat_Vashisht 1 679 Oct-18-2023, 02:08 PM
Last Post: Larz60+
Sad How to split a String from Text Input into 40 char chunks? lastyle 7 1,385 Aug-01-2023, 09:36 AM
Last Post: Pedroski55
  How to "tee" (=split) output to screen and into file? pstein 6 1,629 Jun-24-2023, 08:00 AM
Last Post: Gribouillis
  [split] How to resolve version conflicts in Python? atonalwilson 1 1,095 May-04-2023, 09:02 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020