Python Forum
Counting Duplicates in large Data Set
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Counting Duplicates in large Data Set
#4
Hi

When speaking about duplicates for numbers, I'm alway thinking to "np.unique" => here bellow an example. Note at the same time Numpy is fast even for a huge array size.

Paul

import numpy as np

MyList=[0, 1, 10, 5, 2, 1, -1, 8, 2, 1, 5, 1, 1, 1, -1]
MyList=np.asarray(MyList)

UniqueList = np.unique(MyList, return_index=True, return_counts=True)

n = np.shape(UniqueList[0])[0]
for i in range(n):
    print(f"for {UniqueList[0][i]}  => {UniqueList[2][i]} occurence(s)")
Provinding:
Output:
for -1 => 2 occurence(s) for 0 => 1 occurence(s) for 1 => 6 occurence(s) for 2 => 2 occurence(s) for 5 => 2 occurence(s) for 8 => 1 occurence(s) for 10 => 1 occurence(s)
Reply


Messages In This Thread
Counting Duplicates in large Data Set - by jmair - Dec-06-2022, 01:52 PM
RE: Counting Duplicates in large Data Set - by paul18fr - Dec-07-2022, 09:42 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Add group number for duplicates atomxkai 2 1,155 Dec-08-2022, 06:08 AM
Last Post: atomxkai
  Reading large crapy text file in anaconda to profile data syamatunuguntla 0 844 Nov-18-2022, 06:15 PM
Last Post: syamatunuguntla
  Searching Module to plot large data G_rizzle 0 1,471 Dec-06-2021, 08:00 AM
Last Post: G_rizzle
  Pandas Indexing with duplicates energerecontractuel 3 2,900 Mar-07-2019, 12:57 AM
Last Post: scidam
  How to filter specific rows from large data file Ariane 7 8,296 Jun-29-2018, 02:43 PM
Last Post: gontajones
  jupyter pandas remove duplicates help okl 3 7,543 Feb-25-2018, 01:11 PM
Last Post: glidecode

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020