Python Forum

Full Version: [split] Time Complexity of Counting
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
micseydel speaking here: this is a split from https://python-forum.io/Thread-List-show...-instances

(Jan-18-2017, 07:22 PM)wavic Wrote: [ -> ]I was thinking that this is homework   

Very generously  
Yes, except based on his previous post,
(Jan-18-2017, 11:00 AM)Pie_Fu n Wrote: [ -> ]ok i've got it sorted
if its of benefit to anybody else
he thinks he has already solved the problem.

Therefore he needs to be shown how to do it correctly so he doesn't think the previous attempt was the best way.
I'm not going to obfuscate a couple lines of code.

Also your suggestion,
wavic Wrote:You need to count an element in the incrementing slices of the first list and append the result for an element to the new list.
would be O(n^2) at best.

Edit: Adding my code to this post so people don't need to link back.
My suggested implementation from the previous thread:
sequence = 48,52,55,48,52,55,60,62,48
 
result = []
cum_count = {}
for num in sequence:
    cum_count[num] = cum_count.get(num, 0) + 1
    result.append(cum_count[num])
 
print(result)
Output:
[1, 1, 1, 2, 2, 2, 1, 1, 3]
I came with
for s, i in enumerate(l,1):               
    c = Counter(l[:s])                                         
    res.append(c[i])   
However, how do you get this O(n^2)? How can I do the same? 
What you wrote is O(n^2).  The version I wrote is O(n).
Counting all elements (even of a sublist) every time through the loop is inefficient and pushes the algorithm to quadratic time.

Also note you could just use list.count (or tuple.count depending) for your method:
sequence = 48,52,55,48,52,55,60,62,48
result = [sequence[:i].count(val) for i,val in enumerate(sequence, 1)] 
but this is still O(n^2) which should be avoided.
Is there a guidance how to to do it my self? How to determine the complication of the code
Micseydel wrote a guide for determining time complexity.  You should start there and then ask if you are confused on anything:
https://python-forum.io/Thread-Efficiency-Crash-Course
Thanks for the plug, Mek.

@wavic:
To dig into it a bit more, the code
for s, i in enumerate(l, 1):
    l[:s]
is quadratic (O(n**2)) because slicing is a linear operation. During the first iteration, n work is done, during the second is (n-1) etc. down to 1. On average, n/2 work is done within the loop, but the halving doesn't matter.

Given, f(x)=x*x/2, doubling x results in a more than doubling of f(x), as shown below
f(2) = 2
f(4) = 8
f(8) = 32

The reason Mek's solution is O(n) is that within the loop (which runs n times), only constant-time operations are taken. Operating on a dictionary is efficient (O(1)) so long as you're not doing other linear operations, like collection.count().
Thank you! I'll see it tomorrow. It's midnight here Coffee
Quote:My suggested implementation from the previous thread:
sequence = 48,52,55,48,52,55,60,62,48
 
result = []
cum_count = {}
for num in sequence:
    cum_count[num] = cum_count.get(num, 0) + 1
    result.append(cum_count[num])
 
print(result)
Output:
[1, 1, 1, 2, 2, 2, 1, 1, 3]
I think this program is not of complexity O(n); You have assumed that
dict.get(num,0)
is of constant complexity. However this is not the case; The get method has to go through the dictionary keys every time it is called. Your code has a worst case complexity of O(n^2). Correct me if I'm wrong
According to https://wiki.python.org/moin/TimeComplexity time complexity for get item from a dict is:
  • O(1) in Average Case
  • O(n) in Amortized Worst Case

look at the very bottom of the page
The following code has complexity O(n log(n)) like sorting
import itertools as itt
import operator as op

def solve(problem):
    es = sorted((x, i) for i, x in enumerate(problem))
    t = sorted((x[1], i)
         for k, g in itt.groupby(es, key = op.itemgetter(0))
         for i, x in enumerate(g, 1))
    return [x[1] for x in t]

if __name__ == '__main__':
    problem = [48,52,55,48,52,55,60,62,48]
    expected = [1,1,1,2,2,2,1,1,3]
    assert solve(problem) == expected
Of course, you can reach the same asymptotic complexity by using a sorted dictionary based on binary search instead of the dict class. This module probably provides the needed SortedDict class.

But here, we do it with the standard library alone! For most practical purposes, the dict based solution is probably the best one although it's not O(n log(n))