Regex: Remove all match plus one char before all

***zivoni*** · Feb-23-2017, 05:20 PM

As I am rather ignorant of re and repeated iterations for "linear" problem look ugly, I tried direct naive approach - just traverse through string, copy characters or delete last one if |BS| is found.

def naive(s=None, n=1000):
    if not s:
        s = 'it |BS||BS||BS|this is one|BS||BS||BS|an example' * n
    else:
        s = s * n
    size, ind = len(s), 0
    accumulator = []
    while ind < size:
        if (s[ind] == '|') and (ind+4 <= size) and (s[ind:ind+4] == "|BS|"):
            if accumulator:
                accumulator.pop()
            ind += 4     
        else:
            accumulator.append(s[ind])
            ind += 1
    return "".join(accumulator)

Stolen buran's measuring script gives me:

Output:repeat 1000, short string s*1:

alfalfa --> 0.03940597499968135
buran1  --> 0.015949830999488768
buran2  --> 0.02403012300055707
ofnut   --> 0.04103327599932527
naive   --> 0.010694067999793333

repeat 1, long string s*1000

alfalfa --> 4.176700068000173
buran1  --> 2.2705873200002316
buran2  --> 3.068299023000691
ofnut   --> 0.014837658000033116
naive   --> 0.009339336000266485

repeat 1, very long string s*3000

alfalfa --> 40.76510821000011
buran1  --> 21.95231995500035
buran2  --> 29.43507453300026
ofnut   --> 0.04528477199983172
naive   --> 0.03035950000048615

So it seems that:

buran got faster PC
naive is fastest, but it doesnt improve for longer strings as much as I would guess - using re and repeatly search entire string should be quadratic (and from timing it is), while traversing should be linear - perhaps accumulator "growing" and char/substring checking is rather expensive compared to optimalized re

It should be possible to combine both approaches - find first occurence of |BS| with .find(), copy first parst of string except last char, start search from previous |BS| and so on. Or preallocate accumulator with [None] * size and keep second index to mark end of copied part to avoid growing and deleting chars from end. But gains probably wold be marginal and I am lazy.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Facing issue in python regex newline match	Shr	6	1,670	Oct-25-2023, 09:42 AM Last Post: Shr
	How to split a String from Text Input into 40 char chunks?	lastyle	7	1,385	Aug-01-2023, 09:36 AM Last Post: Pedroski55
	Failing regex, space before and after the "match"	tester_V	6	1,369	Mar-06-2023, 03:03 PM Last Post: deanhystad
	Regex pattern match	WJSwan	2	1,441	Feb-07-2023, 04:52 AM Last Post: WJSwan
	Match substring using regex	Pavel_47	6	1,588	Jul-18-2022, 07:46 AM Last Post: Pavel_47
	Match key-value json,Regex	saam	5	5,637	Dec-07-2021, 03:06 PM Last Post: saam
	How to replace on char with another in a string?	korenron	3	2,484	Dec-03-2020, 07:37 AM Last Post: korenron
	How to remove char from string??	ridgerunnersjw	2	2,664	Sep-30-2020, 03:49 PM Last Post: ridgerunnersjw
	regex.findall that won't match anything	xiaobai97	1	2,140	Sep-24-2020, 02:02 PM Last Post: DeaD_EyE
	Creating new list based on exact regex match in original list	interjectdirector	1	2,409	Mar-08-2020, 09:30 PM Last Post: deanhystad

Regex: Remove all match plus one char before all

User Panel Messages

Announcements