[split] Time Complexity of Counting

***Mekire*** · (This post was last modified: Jan-18-2017, 10:02 PM by Mekire.)

micseydel speaking here: this is a split from https://python-forum.io/Thread-List-show...-instances

(Jan-18-2017, 07:22 PM)wavic Wrote: I was thinking that this is homework

Very generously

Yes, except based on his previous post,

(Jan-18-2017, 11:00 AM)Pie_Fu n Wrote: ok i've got it sorted
if its of benefit to anybody else

he thinks he has already solved the problem.

Therefore he needs to be shown how to do it correctly so he doesn't think the previous attempt was the best way.
I'm not going to obfuscate a couple lines of code.

Also your suggestion,

wavic Wrote:You need to count an element in the incrementing slices of the first list and append the result for an element to the new list.

would be O(n^2) at best.

Edit: Adding my code to this post so people don't need to link back.
My suggested implementation from the previous thread:

sequence = 48,52,55,48,52,55,60,62,48
 
result = []
cum_count = {}
for num in sequence:
    cum_count[num] = cum_count.get(num, 0) + 1
    result.append(cum_count[num])
 
print(result)

Output:
[1, 1, 1, 2, 2, 2, 1, 1, 3]

wavic · Jan-18-2017, 07:59 PM

I came with

for s, i in enumerate(l,1):               
    c = Counter(l[:s])                                         
    res.append(c[i])

However, how do you get this O(n^2)? How can I do the same?

***Mekire*** · (This post was last modified: Jan-18-2017, 09:09 PM by Mekire.)

What you wrote is O(n^2). The version I wrote is O(n).
Counting all elements (even of a sublist) every time through the loop is inefficient and pushes the algorithm to quadratic time.

Also note you could just use list.count (or tuple.count depending) for your method:

sequence = 48,52,55,48,52,55,60,62,48
result = [sequence[:i].count(val) for i,val in enumerate(sequence, 1)]

but this is still O(n^2) which should be avoided.

wavic · Jan-18-2017, 09:09 PM

Is there a guidance how to to do it my self? How to determine the complication of the code

***Mekire*** · Jan-18-2017, 09:10 PM

Micseydel wrote a guide for determining time complexity. You should start there and then ask if you are confused on anything:
https://python-forum.io/Thread-Efficiency-Crash-Course

***micseydel*** · Jan-18-2017, 09:32 PM

Thanks for the plug, Mek.

@wavic:
To dig into it a bit more, the code

for s, i in enumerate(l, 1):
    l[:s]

is quadratic (O(n**2)) because slicing is a linear operation. During the first iteration, n work is done, during the second is (n-1) etc. down to 1. On average, n/2 work is done within the loop, but the halving doesn't matter.

Given, f(x)=x*x/2, doubling x results in a more than doubling of f(x), as shown below
f(2) = 2
f(4) = 8
f(8) = 32

The reason Mek's solution is O(n) is that within the loop (which runs n times), only constant-time operations are taken. Operating on a dictionary is efficient (O(1)) so long as you're not doing other linear operations, like collection.count().

wavic · Jan-18-2017, 10:48 PM

Thank you! I'll see it tomorrow. It's midnight here Coffee

layjain · Jan-10-2019, 07:33 AM

Quote:My suggested implementation from the previous thread:

sequence = 48,52,55,48,52,55,60,62,48
 
result = []
cum_count = {}
for num in sequence:
    cum_count[num] = cum_count.get(num, 0) + 1
    result.append(cum_count[num])
 
print(result)

Output:
[1, 1, 1, 2, 2, 2, 1, 1, 3]

I think this program is not of complexity O(n); You have assumed that

dict.get(num,0)

is of constant complexity. However this is not the case; The get method has to go through the dictionary keys every time it is called. Your code has a worst case complexity of O(n^2). Correct me if I'm wrong

**buran** · (This post was last modified: Jan-10-2019, 10:11 AM by buran.)

According to https://wiki.python.org/moin/TimeComplexity time complexity for get item from a dict is:

O(1) in Average Case
O(n) in Amortized Worst Case

look at the very bottom of the page

**Gribouillis** · (This post was last modified: Jan-10-2019, 11:15 AM by Gribouillis.)

The following code has complexity O(n log(n)) like sorting

import itertools as itt
import operator as op

def solve(problem):
    es = sorted((x, i) for i, x in enumerate(problem))
    t = sorted((x[1], i)
         for k, g in itt.groupby(es, key = op.itemgetter(0))
         for i, x in enumerate(g, 1))
    return [x[1] for x in t]

if __name__ == '__main__':
    problem = [48,52,55,48,52,55,60,62,48]
    expected = [1,1,1,2,2,2,1,1,3]
    assert solve(problem) == expected

Of course, you can reach the same asymptotic complexity by using a sorted dictionary based on binary search instead of the dict class. This module probably provides the needed SortedDict class.

But here, we do it with the standard library alone! For most practical purposes, the dict based solution is probably the best one although it's not O(n log(n))

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	counting lines in split data	Skaperen	6	3,183	Oct-07-2022, 07:09 PM Last Post: Skaperen
	Split gps files based on time (text splitting)	dervast	0	2,404	Nov-09-2020, 09:19 AM Last Post: dervast
	What is the run time complexity of this code and please explain?	samlee916	2	3,055	Nov-06-2020, 02:37 PM Last Post: deanhystad
	Basic Time Complexity Help	Adakip	1	2,430	Aug-25-2019, 09:12 AM Last Post: ThomasL
	Time complexity of dict vs attributes	heras	6	5,275	Aug-24-2018, 10:18 PM Last Post: Gribouillis
	How to calculate the mccabe complexity in module falke8?	fuqianz	1	3,786	Oct-16-2017, 02:08 AM Last Post: sparkz_alot

[split] Time Complexity of Counting

User Panel Messages

Announcements