Python Forum

Hi, looking how to add sequence and group them by numbers for duplicates.

index raw count group number filename Number of Pages CRITERIA
0 1 1 1 file_753951.pdf 2 Starwars
3 4 2 1 file_654321.pdf 2 Starwars
4 5 3 1 file_456123.pdf 2 Starwars
5 6 4 1 file_548564.pdf 2 Starwars
11 12 5 2 file_351643.pdf 2 Trekky
13 14 6 2 file_789654.pdf 2 Trekky
2 3 7 3 file_321564.pdf 2 Guardians
15 16 8 3 file_963852.pdf 2 Guardians
12 13 9 3 file_741852.pdf 3 Guardians

mydata = df["Criteria"]
df_getdupes = df[cfc.isin(cfc[cfc.duplicated()])].sort_values(['Criteria','Number of Pages'])
display(df_getdupes)
df_getdupes.to_csv('output_dupes1.csv')

Updates:
Dec 8
1. the screenshot is my required output.

problem:
1. create group number field
2. write sequential number in group number field per criteria. should be same group number per criteria.

Thank you.

Have you tried DataFrame.groupby(column)?

Not really sure what you want since the screenshot looks the same as the data in the post. Can you post before and after?

(Dec-06-2022, 12:32 AM)deanhystad Wrote: [ -> ]Have you tried DataFrame.groupby(column)?

Not really sure what you want since the screenshot looks the same as the data in the post. Can you post before and after?

thanks for the reply.

I've updated my post. sorry I did'nt include much info before.

I used this example code.

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# First grouping based on "Team"
# Within each team we are grouping based on "Position"
gkk = df.groupby(['Team', 'Position'])
  
# Print the first value in each group
gkk.first()

I did manage to use groupby, but also I wanted to populate same team names as well.

atomxkai

deanhystad

atomxkai