Posts: 2,342
Threads: 62
Joined: Sep 2016
I interviewed a candidate recently for a software engineering position who had a master's degree from UCLA with a 3.8 GPA and a 3.9 GPA in undergrad. They looked good on paper, but after writing the following code, misidentified the time complexity of their solution (the problem is relatively boring, won't post it unless someone really wants to know)
def result(n):
nums = [i+1 for i in xrange(n)]
d = 1
while len(nums)>1:
nums.pop(0)
for j in xrange(d%len(nums)):
nums.append(nums.pop(0))
d += 1
return nums[0] I realize most of the regulars here don't have a formal computer science background, but I'm curious if the tutorial I wrote is good enough for people to identify the running time of this short algorithm.
So, for those interested in identifying it, what do you think the big-O of this function is? What source(s) of learning did you use to make that conclusion?
(I posted this in the Bar since it's not a question I'm asking as much as thing I thought would be fun, and isn't news about Python or anything, even though the candidate happened to write it in (not especially good) Python.)
Posts: 591
Threads: 26
Joined: Sep 2016
So, I believe to begin with, list.pop(0) is a O(n) operation.
It is only constant time if you pop the last item so this means that:
while len(nums)>1:
nums.pop(0) alone is already O(n^2)
The only thing I would struggle with is the complexity of the d%len(nums) loop.
My intuition is that part is also linear in the length of nums so it seems that it could possibly be as bad as O(n^3) with that additional pop(0) in that loop.
For what it is worth I am nearing the end of my online master's with Georgia Tech; though I did not excel in my algorithms course. I also did not do CS as an undergrad; I did linguistics and Japanese. I originally studied physics and EE before switching to linguistics though so I have more math than that would generally imply.
Posts: 687
Threads: 37
Joined: Sep 2016
(Dec-05-2016, 06:48 AM)Mekire Wrote: So, I believe to begin with, list.pop(0) is a O(n) operation.
I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length. But if of course depends how Python implements plain lists. Which means the complexity is dependent on the Python implementation...
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Posts: 591
Threads: 26
Joined: Sep 2016
Dec-05-2016, 05:05 PM
(This post was last modified: Dec-05-2016, 05:06 PM by Mekire.)
(Dec-05-2016, 12:38 PM)Ofnuts Wrote: I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length. I think you will find it is. collections.deque is a doubly linked implementation that offers constant time popping from either end, but to my knowledge this isn't the case with list.
I did some quick tests and it certainly seems much slower to pop the front.
Some nasty timing code:
def result_f(n):
nums = range(n)
while nums:
nums.pop(0)
def result_b(n):
nums = range(n)
while nums:
nums.pop()
if __name__=='__main__':
from timeit import timeit
for i in (2000, 4000, 8000, 16000):
print("Front pop {}: {}".format(i, timeit("result_f({})".format(i), "from __main__ import result_f", number=100)))
print("Back pop {}: {}".format(i, timeit("result_b({})".format(i), "from __main__ import result_b", number=100))) And my results:
Front pop 2000: 0.149457311625
Back pop 2000: 0.0312793883941
Front pop 4000: 0.544667427803
Back pop 4000: 0.0609260950131
Front pop 8000: 1.80832892431
Back pop 8000: 0.117451472662
Front pop 16000: 7.36631624098
Back pop 16000: 0.318763773615 Specs:
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32
Posts: 2,342
Threads: 62
Joined: Sep 2016
(Dec-05-2016, 12:38 PM)Ofnuts Wrote: (Dec-05-2016, 06:48 AM)Mekire Wrote: So, I believe to begin with, list.pop(0) is a O(n) operation. I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length. To elaborate a bit more on Mek's comments... CPython's list uses an implementation similar to Java's ArrayList (and C++'s vector I think, but I know less C++). Removing an element is linear, since everything will be shifted over to keep it contiguous, so popping from the end is efficient since no shifting needs to be done, but from the beginning is costly.
(Dec-05-2016, 12:38 PM)Ofnuts Wrote: But if of course depends how Python implements plain lists. Which means the complexity is dependent on the Python implementation... If a different Python implementation used a linked list, then popping from the beginning would be O(1) but then accessing by index would be O(n), which would make a lot of Python code accidentally quadratic. The CPython time complexity wiki even says
The Wiki Wrote:Other Python implementations (or older or still-under development versions of CPython) may have slightly different performance characteristics. However, it is generally safe to assume that they are not slower by more than a factor of O(log n).
Any other takers on the overall time complexity?
Posts: 687
Threads: 37
Joined: Sep 2016
Dec-06-2016, 12:50 AM
(This post was last modified: Dec-06-2016, 11:06 AM by Ofnuts.)
(Dec-05-2016, 05:05 PM)Mekire Wrote: (Dec-05-2016, 12:38 PM)Ofnuts Wrote: I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length. I think you will find it is. collections.deque is a doubly linked implementation that offers constant time popping from either end, but to my knowledge this isn't the case with list.
I did some quick tests and it certainly seems much slower to pop the front.
Some nasty timing code:
def result_f(n):
nums = range(n)
while nums:
nums.pop(0)
def result_b(n):
nums = range(n)
while nums:
nums.pop()
if __name__=='__main__':
from timeit import timeit
for i in (2000, 4000, 8000, 16000):
print("Front pop {}: {}".format(i, timeit("result_f({})".format(i), "from __main__ import result_f", number=100)))
print("Back pop {}: {}".format(i, timeit("result_b({})".format(i), "from __main__ import result_b", number=100))) And my results:
Front pop 2000: 0.149457311625
Back pop 2000: 0.0312793883941
Front pop 4000: 0.544667427803
Back pop 4000: 0.0609260950131
Front pop 8000: 1.80832892431
Back pop 8000: 0.117451472662
Front pop 16000: 7.36631624098
Back pop 16000: 0.318763773615 Specs:
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32
OK. I stand corrected... However your numbers seem to indicate that pop() is worse than O(1) (should be same value for all lengths, here it looks vaguely proportional) and pop(0) is much worse than linear (looks roughly quadratic: 4x when length doubles).
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Posts: 2,342
Threads: 62
Joined: Sep 2016
You're seeing linear and quadratic instead of constant and linear because the timing functions include loops. So that's the expected result.
Posts: 591
Threads: 26
Joined: Sep 2016
(Dec-06-2016, 12:56 AM)micseydel Wrote: You're seeing linear and quadratic instead of constant and linear because the timing functions include loops. So that's the expected result. Yeah, hard to show they differ in complexity rather than just constant runtime timeing them alone so I threw them in loops; both to mirror your original snippet as well as to show the linear vs quad.
Posts: 2,342
Threads: 62
Joined: Sep 2016
Answer:
It is, indeed, O(n**3). The outer and inner loops are both linear, and the inner loop contains the linear pop(0) operation. If a deque or similar structure were used instead of a list, it would be merely O(n**2), but the nested loops make it definitely worse than linear.
If anyone is confused about why, feel free to ask questions and we can talk it through.
There's also a linear solution if anyone is curious, though the proof for it involves a recurrence relation and is beyond my non-stretch capabilities.
Posts: 591
Threads: 26
Joined: Sep 2016
Thanks for the follow up. You had me doubting myself there.
|