Posts: 687
Threads: 37
Joined: Sep 2016
Feb-21-2017, 09:22 PM
(This post was last modified: Feb-21-2017, 09:22 PM by Ofnuts.)
Warning, aspirin required.
This is quite tricky because you can have anything before |BS| , including another |BS| . And covering your rear with something such as [\|]|BS| isn't general enough because it prevents backspacing over a | . and in a regexp you can't express something like "not this string"...
So, you have to attack at the other end: use a regexp that will match any character followed by whole sequence of consecutive |BS| . Due to the greedy way things are matched, this will always include the whole sequence of consecutive |BS| , so you initial character cannot be itself part of a |BS| .
Then look at the fine print in the specs of re.sub(), it looks for non-overlapping occurences of the pattern, so the search for the next match starts after the end of the current match... which is after the end of the sequence of |BS| , so in a sequence of |BS| you will only process one per call to sub().
So in practice, we look for a character followed by a |BS| followed by zero or more other |BS| (captured in a group) and replace that by just that captured group:
import re
pattern=re.compile(r'.\|BS\|((\|BS\|)*)')
def noBS(s):
print '------------'
previous=''
while s!=previous:
previous=s
s=re.sub(pattern,r'\1',s)
print s # this shows that the two sequences of |BS| are processed in parallel
return s
print noBS("it |BS||BS||BS|this is one|BS||BS||BS|an example")
print noBS("it |BS||BS||BS| |BS|this is one|BS||BS||BS|an example")
print noBS("it |BS||BS||BS| |BS|this is o n e|BS||BS||BS||BS||BS||BS|an example")
# The first 'BS|' gets backspaced over due to missing leading '|'...
print noBS("it BS||BS||BS||BS||BS||BS||BS|this is o n e|BS||BS||BS||BS||BS||BS|an example") Output for he last one:
Output: it BS|BS||BS||BS||BS||BS|this is o n |BS||BS||BS||BS||BS|an example
it B|BS||BS||BS||BS|this is o n |BS||BS||BS||BS|an example
it |BS||BS||BS|this is o n|BS||BS||BS|an example
it|BS||BS|this is o |BS||BS|an example
i|BS|this is o|BS|an example
this is an example
Unfortunately, I don't think you can avoid n explicit iteration.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Posts: 2,953
Threads: 48
Joined: Sep 2016
Feb-21-2017, 10:57 PM
(This post was last modified: Feb-21-2017, 10:57 PM by wavic.)
I have to learn regular expressions at last
>>> import re
>>> s = 'it BS|BS||BS||BS||BS||BS|this is o n |BS||BS||BS||BS||BS|an example'
>>> new_s = s.replace('|BS|', '\b')
>>> new_s
'it BS\x08\x08\x08\x08\x08this is o n \x08\x08\x08\x08\x08an example'
>>> while '\x08' in new_s:
... new_s = re.sub('[^\x08]\x08', '', new_s)
...
>>> new_s
'this is an example'
>>> Thanks to this
Posts: 164
Threads: 22
Joined: Feb 2017
Thanks for all your reply. After testing them all, I ended up using buran's code which is the faster and work as expected.
buran: 2.6e-06s
wavic: 2.1e-05s (infinite loop if the string begin with |BS|)
ofnuts: 6.3e-05s
pattern = re.compile(r'[\w ]?\|BS\|')
buffer = "it BS|BS||BS||BS||BS||BS|this is o n |BS||BS||BS||BS||BS|an example"
while True:
after_sub = pattern.sub('', buffer, count=1)
if buffer == after_sub:
break
else:
buffer = after_sub
print(buffer)
Posts: 2,953
Threads: 48
Joined: Sep 2016
See the link in my prev. post. Actually this one. As I said, I don't know regular expressions
Posts: 8,163
Threads: 160
Joined: Sep 2016
check this one, I think this should speed it up, because the each |BS| group and preceding chars are replaced in one re.sub
#!/usr/bin/python3
import re
strings = ['it |BS||BS||BS|this is one|BS||BS||BS|an example',
'it |BS|this is an example',
'it |BS||BS|this is an example',
'it |BS||BS||BS|this is an example',
'it |BS||BS||BS||BS|this is an example',
'this one|BS||BS||BS||BS|it |BS||BS||BS||BS|']
ptrn = re.compile(r'(\|BS\|)+')
for string in strings:
print(string)
while True:
match = re.search(ptrn, string)
if match:
num_chars = min(match.start(), int(len(match.group())/4))
sub_pattern = re.compile(r'[\w ]{{{}}}(\|BS\|)+'.format(num_chars))
string = sub_pattern.sub('', string, count=1)
else:
break
print(string)
print('\n') Also note the last test string, it's a border case when later |BS| group will delete chars preceding previous |BS| group.
Posts: 164
Threads: 22
Joined: Feb 2017
Feb-22-2017, 03:35 AM
(This post was last modified: Feb-22-2017, 04:29 AM by Alfalfa.)
This one is slightly slower, it takes about 4.1e-06s to execute. With both codes I noticed it block when encountering a special char:
input: "it BS|BS||BS||BS||BS||BS|this is one|BS||BS|an example"
outpt: "this is an example"
input: "it BS|BS||BS||BS||BS||BS|this is on.e|BS||BS||BS|an example"
outpt: "this is on.an example"
Actually, I timed the original code, and I am quite amazed to realize it is the fastest with an average of 1.7e-06s.. !
Posts: 8,163
Threads: 160
Joined: Sep 2016
Feb-22-2017, 08:01 AM
(This post was last modified: Feb-22-2017, 08:01 AM by buran.)
I don't know how you time it but here is what I get:
import re
import timeit
def alfalfa(input_str=None, n=1000):
if not input_str:
string = 'it |BS||BS||BS|this is one|BS||BS||BS|an example'*n
else:
string = input_str
while re.search("\|BS\|", string):
array = list(string)
for m in re.finditer("\|BS\|", string):
del array[m.start():m.end()]
if m.start()-1 >= 0:
del array[m.start()-1]
string = ''.join(array)
break
return string
def buran1(input_str = None, n=1000):
ptrn = re.compile(r'[\w ]?\|BS\|')
if not input_str:
string = 'it |BS||BS||BS|this is one|BS||BS||BS|an example'*n
else:
string = input_str
while True:
after_sub = ptrn.sub('', string, count=1)
if string == after_sub:
break
else:
string = after_sub
return string
def buran2(input_str=None, n=1000):
ptrn = re.compile(r'(\|BS\|)+')
if not input_str:
string = 'it |BS||BS||BS|this is one|BS||BS||BS|an example'*n
else:
string = input_str
while True:
match = re.search(ptrn, string)
if match:
num_chars = min(match.start(), int(len(match.group())/4))
sub_pattern = re.compile(r'[\w ]{{{}}}(\|BS\|)+'.format(num_chars))
string = sub_pattern.sub('', string, count=1)
else:
break
return string
def noBS(s=None, n=1000):
if not s:
s = 'it |BS||BS||BS|this is one|BS||BS||BS|an example'*n
else:
s = s*n
pattern=re.compile(r'.\|BS\|((\|BS\|)*)')
previous=''
while s!=previous:
previous=s
s=re.sub(pattern,r'\1',s)
return s
if __name__ == '__main__':
print 'repeat 1000, short string:\n'
print 'alfalfa --> {}'.format(timeit.timeit("alfalfa(n=1)", number=1000, setup="from __main__ import alfalfa"))
print 'buran1 --> {}'.format(timeit.timeit("buran1(n=1)", number=1000, setup="from __main__ import buran1"))
print 'buran2 --> {}'.format(timeit.timeit("buran2(n=1)", number=1000, setup="from __main__ import buran2"))
print 'ofnut --> {}'.format(timeit.timeit("noBS(n=1)", number=1000, setup="from __main__ import noBS"))
print '\nrepeat 1, long string\n'
print 'alfalfa --> {}'.format(timeit.timeit("alfalfa()", number=1, setup="from __main__ import alfalfa"))
print 'buran1 --> {}'.format(timeit.timeit("buran1()", number=1, setup="from __main__ import buran1"))
print 'buran2 --> {}'.format(timeit.timeit("buran2()", number=1, setup="from __main__ import buran2"))
print 'ofnut --> {}'.format(timeit.timeit("noBS()", number=1, setup="from __main__ import noBS")) and the result of two consecutive runs:
Output: repeat 1000, short string:
alfalfa --> 0.0432239843385
buran1 --> 0.0112259009714
buran2 --> 0.0158689890339
ofnut --> 0.0273017555023
repeat 1, long string
alfalfa --> 3.50733362241
buran1 --> 1.34837528801
buran2 --> 1.86298544437
ofnut --> 0.0084199068111
repeat 1000, short string:
alfalfa --> 0.0284217156815
buran1 --> 0.00996738901746
buran2 --> 0.0157894500521
ofnut --> 0.0273982927342
repeat 1, long string
alfalfa --> 3.52313333556
buran1 --> 1.35965603239
buran2 --> 1.82195551718
ofnut --> 0.00834742370672
Posts: 164
Threads: 22
Joined: Feb 2017
That is strange, I simply used a for loop and made an average, like so;
#!/usr/bin/python3
import re
import time
pattern = re.compile(r'[\w ]?\|BS\|')
buffer = "it |BS||BS||BS|this is one|BS||BS||BS|an example" #|BS| as in Backspace
test=time.time()
for x in range(0,100000):
while re.search("\|BS\|", buffer):
array = list(buffer)
for m in re.finditer("\|BS\|", buffer):
del array[m.start():m.end()]
if m.start()-1 >= 0:
del array[m.start()-1]
buffer = ''.join(array)
break
print(buffer)
print((time.time()-test)/100000) I though it might be python 3 vs 2, altough with the example you provided I get similar results as what you just showed..
Anyhow, do you know how to fix the pattern in order to accept non-alphanumeric chars?
Posts: 8,163
Threads: 160
Joined: Sep 2016
I think r'[\S\s]?\|BS\|' should work
Posts: 164
Threads: 22
Joined: Feb 2017
It seems to work great. Thank you for the extended support
|