Identifying Efficiency

***micseydel*** · Dec-04-2016, 11:46 PM

I interviewed a candidate recently for a software engineering position who had a master's degree from UCLA with a 3.8 GPA and a 3.9 GPA in undergrad. They looked good on paper, but after writing the following code, misidentified the time complexity of their solution (the problem is relatively boring, won't post it unless someone really wants to know)

def result(n):
   nums = [i+1 for i in xrange(n)]
   d = 1
   while len(nums)>1:
       nums.pop(0)
       for j in xrange(d%len(nums)):
           nums.append(nums.pop(0))
       d += 1
   return nums[0]

I realize most of the regulars here don't have a formal computer science background, but I'm curious if the tutorial I wrote is good enough for people to identify the running time of this short algorithm.

So, for those interested in identifying it, what do you think the big-O of this function is? What source(s) of learning did you use to make that conclusion?

(I posted this in the Bar since it's not a question I'm asking as much as thing I thought would be fun, and isn't news about Python or anything, even though the candidate happened to write it in (not especially good) Python.)

***Mekire*** · Dec-05-2016, 06:48 AM

So, I believe to begin with, list.pop(0) is a O(n) operation.
It is only constant time if you pop the last item so this means that:

while len(nums)>1:
  nums.pop(0)

alone is already O(n^2)

The only thing I would struggle with is the complexity of the d%len(nums) loop.
My intuition is that part is also linear in the length of nums so it seems that it could possibly be as bad as O(n^3) with that additional pop(0) in that loop.

For what it is worth I am nearing the end of my online master's with Georgia Tech; though I did not excel in my algorithms course. I also did not do CS as an undergrad; I did linguistics and Japanese. I originally studied physics and EE before switching to linguistics though so I have more math than that would generally imply.

***Ofnuts*** · Dec-05-2016, 12:38 PM

(Dec-05-2016, 06:48 AM)Mekire Wrote: So, I believe to begin with, list.pop(0) is a O(n) operation.

I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length. But if of course depends how Python implements plain lists. Which means the complexity is dependent on the Python implementation...

***Mekire*** · (This post was last modified: Dec-05-2016, 05:06 PM by Mekire.)

(Dec-05-2016, 12:38 PM)Ofnuts Wrote: I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length.

I think you will find it is. collections.deque is a doubly linked implementation that offers constant time popping from either end, but to my knowledge this isn't the case with list.

I did some quick tests and it certainly seems much slower to pop the front.
Some nasty timing code:

def result_f(n):
    nums = range(n)
    while nums:
        nums.pop(0)


def result_b(n):
    nums = range(n)
    while nums:
        nums.pop()


if __name__=='__main__':
    from timeit import timeit
    for i in (2000, 4000, 8000, 16000):
        print("Front pop {}: {}".format(i, timeit("result_f({})".format(i), "from __main__ import result_f", number=100)))
        print("Back pop {}: {}".format(i, timeit("result_b({})".format(i), "from __main__ import result_b", number=100)))

And my results:

Front pop 2000: 0.149457311625
Back pop 2000: 0.0312793883941
Front pop 4000: 0.544667427803
Back pop 4000: 0.0609260950131
Front pop 8000: 1.80832892431
Back pop 8000: 0.117451472662
Front pop 16000: 7.36631624098
Back pop 16000: 0.318763773615

Specs:

Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32

***micseydel*** · Dec-05-2016, 05:14 PM

(Dec-05-2016, 12:38 PM)Ofnuts Wrote:
(Dec-05-2016, 06:48 AM)Mekire Wrote: So, I believe to begin with, list.pop(0) is a O(n) operation.
I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length.

To elaborate a bit more on Mek's comments... CPython's list uses an implementation similar to Java's ArrayList (and C++'s vector I think, but I know less C++). Removing an element is linear, since everything will be shifted over to keep it contiguous, so popping from the end is efficient since no shifting needs to be done, but from the beginning is costly.

(Dec-05-2016, 12:38 PM)Ofnuts Wrote: But if of course depends how Python implements plain lists. Which means the complexity is dependent on the Python implementation...

If a different Python implementation used a linked list, then popping from the beginning would be O(1) but then accessing by index would be O(n), which would make a lot of Python code accidentally quadratic. The CPython time complexity wiki even says

The Wiki Wrote:Other Python implementations (or older or still-under development versions of CPython) may have slightly different performance characteristics. However, it is generally safe to assume that they are not slower by more than a factor of O(log n).

Any other takers on the overall time complexity?

***Ofnuts*** · (This post was last modified: Dec-06-2016, 11:06 AM by Ofnuts.)

(Dec-05-2016, 05:05 PM)Mekire Wrote:
(Dec-05-2016, 12:38 PM)Ofnuts Wrote: I hope not... It is fairly straightforward to have lists with direct access to first and last elements, so pop(0) and pop() would not be dependent on list length.
I think you will find it is. collections.deque is a doubly linked implementation that offers constant time popping from either end, but to my knowledge this isn't the case with list.

I did some quick tests and it certainly seems much slower to pop the front.
Some nasty timing code:
def result_f(n):
    nums = range(n)
    while nums:
        nums.pop(0)


def result_b(n):
    nums = range(n)
    while nums:
        nums.pop()


if __name__=='__main__':
    from timeit import timeit
    for i in (2000, 4000, 8000, 16000):
        print("Front pop {}: {}".format(i, timeit("result_f({})".format(i), "from __main__ import result_f", number=100)))
        print("Back pop {}: {}".format(i, timeit("result_b({})".format(i), "from __main__ import result_b", number=100)))
And my results:
Front pop 2000: 0.149457311625
Back pop 2000: 0.0312793883941
Front pop 4000: 0.544667427803
Back pop 4000: 0.0609260950131
Front pop 8000: 1.80832892431
Back pop 8000: 0.117451472662
Front pop 16000: 7.36631624098
Back pop 16000: 0.318763773615
Specs:
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32

OK. I stand corrected... However your numbers seem to indicate that pop() is worse than O(1) (should be same value for all lengths, here it looks vaguely proportional) and pop(0) is much worse than linear (looks roughly quadratic: 4x when length doubles).

***micseydel*** · Dec-06-2016, 12:56 AM

You're seeing linear and quadratic instead of constant and linear because the timing functions include loops. So that's the expected result.

***Mekire*** · Dec-06-2016, 01:07 AM

(Dec-06-2016, 12:56 AM)micseydel Wrote: You're seeing linear and quadratic instead of constant and linear because the timing functions include loops. So that's the expected result.

Yeah, hard to show they differ in complexity rather than just constant runtime timeing them alone so I threw them in loops; both to mirror your original snippet as well as to show the linear vs quad.

***micseydel*** · Dec-07-2016, 08:01 PM

Answer:

Hide/Show

***Mekire*** · Dec-08-2016, 04:10 AM

Thanks for the follow up. You had me doubting myself there.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	identifying some forum software	Skaperen	4	111,950	Apr-24-2017, 08:53 AM Last Post: wavic

Identifying Efficiency

User Panel Messages

Announcements