Dec-12-2017, 11:56 PM
The issue is that b has a length equal to np.amax(a) + 1 which is between 1 and 5 billion. Scanning this list to find the 3's takes a lot of time. I suggest that you build a dictionary with the values of a as keys and the number of occurrences as values, then you can retrieve all the keys which have the value 3. Something like this (not tested!):
Let us know your final code and the results of your tests!
# Create a dictionary with elements of a # and the number of times they appear mydict = {} for each e in a: if e in mydict: mydict[e] += 1 else: mydict[e] = 1 # Select keys which have the value 3 c = [k for k, v in mydict.items() if v == 3]So instead of scanning 3 billion items (average), you will create a dictionary that requires 3 million entries by 22 times to access a single key (2**25 > 3,000,000) plus another pass to produce the final output, meaning less than 100 millions operations. More important, the time of execution is now depending on the number of elements in a, not the largest value in a.
Let us know your final code and the results of your tests!