![]() |
Counting Duplicates in large Data Set - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Counting Duplicates in large Data Set (/thread-38882.html) |
Counting Duplicates in large Data Set - jmair - Dec-06-2022 My students and I were discussing the difference between possible and probability when it comes to lottery numbers. Is the sequence 1 2 3 4 5 6 just as likely to happen as 76 31 7 54 29 18 ? My question is, what would be a good format to record six random numbers and what method to count the duplicates? RE: Counting Duplicates in large Data Set - perfringo - Dec-06-2022 Simple simulation would do: run n-times random.sample on the desired range and convert result to tuple and feed to collections Counter. Then inspect results. If order doesn't matter, then before converting to tuple use sorted. I tried with n=1000000 and there was no 1, 2, 3, 4, 5, 6 in results (I used range(1, 49)). RE: Counting Duplicates in large Data Set - deanhystad - Dec-06-2022 Use a set instead of a tuple. Should be faster. But I don't think this as a possible solution. "Simple solution" is not going to work because the numbers are staggering. I am not surprised that 1 million combinations did not produce a single 1, 2, 3, 4, 5, 6. One million is a pretty small sample size when there are almost 14 million combinations. For the kind of test you propose I would suggest 1 billion combinations. and how are you going to store the 320 MB Counter dictionary? RE: Counting Duplicates in large Data Set - paul18fr - Dec-07-2022 Hi When speaking about duplicates for numbers, I'm alway thinking to "np.unique" => here bellow an example. Note at the same time Numpy is fast even for a huge array size. Paul import numpy as np MyList=[0, 1, 10, 5, 2, 1, -1, 8, 2, 1, 5, 1, 1, 1, -1] MyList=np.asarray(MyList) UniqueList = np.unique(MyList, return_index=True, return_counts=True) n = np.shape(UniqueList[0])[0] for i in range(n): print(f"for {UniqueList[0][i]} => {UniqueList[2][i]} occurence(s)")Provinding:
|