Feb-23-2017, 07:53 PM
I spent some time with line profiler, s repeated 3000 times -> single string with 90000 chars.
Naive approach:
It seems that the growing of accumulator is not so bad. And as most of time is spent with simple operations, its hard to improve this while iterating over all characters. Perhaps processing it in chunks (find next |BS|, copy except last one, find next |BS|, copy/delete) could speed it significantly, but that would need some effort to keep parts of a string in list, eventually deleting them and going to previous ones if tons of consecutive |BS|'s is met.
First time I just skimmed through it, so I didnt notice that it removes one "level" of |BS| each pass . There are at most three consecutive |BS| in the example string, after fourth iteration string is same as after third iteration and it stops.
I checked it more carefully and noticed that it behaves differently than a naive approach. Naive traversing "uses" only |BS|'s that were in the initial string. There is no returning except deleting, so no new |BS|. All other functions check repeatedly entire string, so newly generated |BS| are processed. I am not sure what is right here - it depends on exact definition of problem.
Example of string generating new |BS|:
buran's first:
buran's second:
Naive approach:
It seems that the growing of accumulator is not so bad. And as most of time is spent with simple operations, its hard to improve this while iterating over all characters. Perhaps processing it in chunks (find next |BS|, copy except last one, find next |BS|, copy/delete) could speed it significantly, but that would need some effort to keep parts of a string in list, eventually deleting them and going to previous ones if tons of consecutive |BS|'s is met.
Output:In [5]: %lprun -f naive naive(n=3000)
Timer unit: 1e-06 s
Total time: 0.344973 s
File: /tmp/bs.py
Function: naive at line 66
Line # Hits Time Per Hit % Time Line Contents
==============================================================
66 def naive(s=None, n=1000):
67 1 4 4.0 0.0 if not s:
68 1 46 46.0 0.0 s = 'it |BS||BS||BS|this is one|BS||BS||BS|an example' * n
69 else:
70 s = s * n
71 1 4 4.0 0.0 size, ind = len(s), 0
72 1 3 3.0 0.0 accumulator = []
73 90001 69329 0.8 20.1 while ind < size:
74 90000 75776 0.8 22.0 if ((s[ind] == '|')
75 18000 15236 0.8 4.4 and (ind+4 <= size)
76 18000 18577 1.0 5.4 and (s[ind:ind+4] == "|BS|")):
77 18000 13047 0.7 3.8 if accumulator:
78 18000 15157 0.8 4.4 accumulator.pop()
79 18000 14372 0.8 4.2 ind += 4
80 else:
81 72000 65994 0.9 19.1 accumulator.append(s[ind])
82 72000 56921 0.8 16.5 ind += 1
83 1 507 507.0 0.1 return "".join(accumulator)
Ofnuts' function:First time I just skimmed through it, so I didnt notice that it removes one "level" of |BS| each pass . There are at most three consecutive |BS| in the example string, after fourth iteration string is same as after third iteration and it stops.
I checked it more carefully and noticed that it behaves differently than a naive approach. Naive traversing "uses" only |BS|'s that were in the initial string. There is no returning except deleting, so no new |BS|. All other functions check repeatedly entire string, so newly generated |BS| are processed. I am not sure what is right here - it depends on exact definition of problem.
Example of string generating new |BS|:
Output:In [11]: s = "|Bu|Bu|BS|S|S|"
...: print('%-10s -> "%s"' %('naive', naive(s, n=1)))
...: print('%-10s -> "%s"' %('Ofnuts', noBS(s, n=1)))
...: print('%-10s -> "%s"' %('alfalfa', alfalfa(s, n=1)))
...: print('%-10s -> "%s"' %('buran1', buran1(s, n=1)))
...: print('%-10s -> "%s"' %('buran2', buran2(s, n=1)))
...:
naive -> "|Bu|BS|S|" # there was only one |BS| in initial string, so one deletion
Ofnuts -> "|BS|" # deletes repeatedly, but last one is not catched - doesn't match .\|BS\|
alfalfa -> "" # last three iterate till there is nothing left ...
buran1 -> ""
buran2 -> ""
Almost all time is spent in four re.sub passes:Output:In [6]: %lprun -f noBS noBS(n=3000)
Timer unit: 1e-06 s
Total time: 0.109533 s
File: /tmp/bs.py
Function: noBS at line 53
Line # Hits Time Per Hit % Time Line Contents
==============================================================
53 def noBS(s=None, n=1000):
54 1 3 3.0 0.0 if not s:
55 1 35 35.0 0.0 s = 'it |BS||BS||BS|this is one|BS||BS||BS|an example' * n
56 else:
57 s = s * n
58 1 15 15.0 0.0 pattern = re.compile(r'.\|BS\|((\|BS\|)*)')
59 1 2 2.0 0.0 previous = ''
60 5 13 2.6 0.0 while s != previous:
61 4 6 1.5 0.0 previous = s
62 4 109458 27364.5 99.9 s = re.sub(pattern, r'\1', s)
63 1 1 1.0 0.0 return s
alfalfa's function:buran's first:
buran's second: