Python Forum
Add group number for duplicates
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Add group number for duplicates
#1
Hi, looking how to add sequence and group them by numbers for duplicates.

index raw count group number filename Number of Pages CRITERIA
0 1 1 1 file_753951.pdf 2 Starwars
3 4 2 1 file_654321.pdf 2 Starwars
4 5 3 1 file_456123.pdf 2 Starwars
5 6 4 1 file_548564.pdf 2 Starwars
11 12 5 2 file_351643.pdf 2 Trekky
13 14 6 2 file_789654.pdf 2 Trekky
2 3 7 3 file_321564.pdf 2 Guardians
15 16 8 3 file_963852.pdf 2 Guardians
12 13 9 3 file_741852.pdf 3 Guardians

mydata = df["Criteria"]
df_getdupes = df[cfc.isin(cfc[cfc.duplicated()])].sort_values(['Criteria','Number of Pages'])
display(df_getdupes)
df_getdupes.to_csv('output_dupes1.csv')
Updates:
Dec 8
1. the screenshot is my required output.

problem:
1. create group number field
2. write sequential number in group number field per criteria. should be same group number per criteria.

Thank you.

Attached Files

Thumbnail(s)
   
Reply
#2
Have you tried DataFrame.groupby(column)?

Not really sure what you want since the screenshot looks the same as the data in the post. Can you post before and after?
Reply
#3
(Dec-06-2022, 12:32 AM)deanhystad Wrote: Have you tried DataFrame.groupby(column)?

Not really sure what you want since the screenshot looks the same as the data in the post. Can you post before and after?

thanks for the reply.

I've updated my post. sorry I did'nt include much info before.

I used this example code.
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# First grouping based on "Team"
# Within each team we are grouping based on "Position"
gkk = df.groupby(['Team', 'Position'])
  
# Print the first value in each group
gkk.first()
I did manage to use groupby, but also I wanted to populate same team names as well.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Counting Duplicates in large Data Set jmair 3 1,135 Dec-07-2022, 09:42 AM
Last Post: paul18fr
  Pandas Indexing with duplicates energerecontractuel 3 2,870 Mar-07-2019, 12:57 AM
Last Post: scidam
  How to group variables & check correlation of group variables wrt single variable SriRajesh 2 2,970 May-23-2018, 03:01 PM
Last Post: SriRajesh
  jupyter pandas remove duplicates help okl 3 7,517 Feb-25-2018, 01:11 PM
Last Post: glidecode

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020