Python Forum
Regex: Remove all match plus one char before all
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex: Remove all match plus one char before all
#24
As I am rather ignorant of re and repeated iterations for "linear" problem look ugly, I tried direct naive approach - just traverse through string, copy characters or delete last one if |BS| is found.

def naive(s=None, n=1000):
    if not s:
        s = 'it |BS||BS||BS|this is one|BS||BS||BS|an example' * n
    else:
        s = s * n
    size, ind = len(s), 0
    accumulator = []
    while ind < size:
        if (s[ind] == '|') and (ind+4 <= size) and (s[ind:ind+4] == "|BS|"):
            if accumulator:
                accumulator.pop()
            ind += 4     
        else:
            accumulator.append(s[ind])
            ind += 1
    return "".join(accumulator)
Stolen buran's measuring script gives me:

Output:
repeat 1000, short string s*1: alfalfa --> 0.03940597499968135 buran1  --> 0.015949830999488768 buran2  --> 0.02403012300055707 ofnut   --> 0.04103327599932527 naive   --> 0.010694067999793333 repeat 1, long string s*1000 alfalfa --> 4.176700068000173 buran1  --> 2.2705873200002316 buran2  --> 3.068299023000691 ofnut   --> 0.014837658000033116 naive   --> 0.009339336000266485 repeat 1, very long string s*3000 alfalfa --> 40.76510821000011 buran1  --> 21.95231995500035 buran2  --> 29.43507453300026 ofnut   --> 0.04528477199983172 naive   --> 0.03035950000048615
So it seems that:
  • buran got faster PC
  • naive is fastest, but it doesnt improve for longer strings as much as I would guess - using re and repeatly search entire string should be quadratic (and from timing it is), while traversing should be linear - perhaps accumulator "growing" and char/substring checking is rather expensive compared to optimalized re
It should be possible to combine both approaches - find first occurence of |BS| with .find(), copy first parst of string except  last char, start search from previous |BS| and so on. Or preallocate accumulator with [None] * size and keep second index to mark end of copied part to avoid growing and deleting chars from end. But gains probably wold be marginal and I am lazy.
Reply


Messages In This Thread
RE: Regex: Remove all match plus one char before all - by zivoni - Feb-23-2017, 05:20 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Facing issue in python regex newline match Shr 6 1,670 Oct-25-2023, 09:42 AM
Last Post: Shr
Sad How to split a String from Text Input into 40 char chunks? lastyle 7 1,385 Aug-01-2023, 09:36 AM
Last Post: Pedroski55
  Failing regex, space before and after the "match" tester_V 6 1,369 Mar-06-2023, 03:03 PM
Last Post: deanhystad
  Regex pattern match WJSwan 2 1,441 Feb-07-2023, 04:52 AM
Last Post: WJSwan
  Match substring using regex Pavel_47 6 1,588 Jul-18-2022, 07:46 AM
Last Post: Pavel_47
  Match key-value json,Regex saam 5 5,637 Dec-07-2021, 03:06 PM
Last Post: saam
  How to replace on char with another in a string? korenron 3 2,484 Dec-03-2020, 07:37 AM
Last Post: korenron
  How to remove char from string?? ridgerunnersjw 2 2,664 Sep-30-2020, 03:49 PM
Last Post: ridgerunnersjw
  regex.findall that won't match anything xiaobai97 1 2,140 Sep-24-2020, 02:02 PM
Last Post: DeaD_EyE
  Creating new list based on exact regex match in original list interjectdirector 1 2,409 Mar-08-2020, 09:30 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020