Python Forum
Pandas + Groupby + Filter unique values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas + Groupby + Filter unique values
#1
I have this dataset:

df=pd.DataFrame({'user':[1,1,2,2,2,3,3,3,3,3,4,4],
                  'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-20'],
                  'type':['a','b','a','x','b','a','x','a','x','b','a','b']})
Which gives me:

user    date    type
1   1995-09-01  a
1   1995-09-02  b
2   1995-10-03  a
2   1995-10-04  x
2   1995-10-05  b
3   1995-11-07  a
3   1995-11-08  x
3   1995-11-09  a
3   1995-11-10  x
3   1995-11-15  b
4   1995-12-18  a
4   1995-12-20  b
and I want to get a new column, with dates based on a values. When a user has more than an value type a value , the date of the oldest a value of this user should be selected to show on the new column.

For example, user 3 has several a values on the type column. Thus, on the a_type_date column, the eldest date for the a value is chosen.

Here is the desired output:

user    date      type   a_type_date
    1   1995-09-01  a     1995-09-01
    1   1995-09-02  b     1995-09-01
    2   1995-10-03  a     1995-10-03
    2   1995-10-04  x     1995-10-03
    2   1995-10-05  b     1995-10-03
    3   1995-11-07  a     1995-11-07
    3   1995-11-08  x     1995-11-07
    3   1995-11-09  a     1995-11-07
    3   1995-11-10  x     1995-11-07
    3   1995-11-15  b     1995-11-07
    4   1995-12-18  a     1995-12-18
    4   1995-12-20  b     1995-12-18
Any help will be highly appreciated.
Reply
#2
1. First, convert the date column into a datetime() index:

df.date = pd.to_datetime(df.date)
2. Next use min() and list comprehension to create a new column containing your earliest (use max() for latest) type a dates per user as follows:

df['a_type_date'] = [min(df[(df.user==i)&(df.type=='a')]['date']).date() for i in df.user]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Separating unique, stable, samples using pandas keithpfio 1 412 Jun-20-2022, 07:06 PM
Last Post: keithpfio
  pandas: Compute the % of the unique values in a column JaneTan 1 1,206 Oct-25-2021, 07:55 PM
Last Post: jefsummers
  update values in one dataframe based on another dataframe - Pandas iliasb 2 5,949 Aug-14-2021, 12:38 PM
Last Post: jefsummers
  groupby on var with missing values error zenvega 0 1,122 May-07-2021, 07:40 PM
Last Post: zenvega
  Remove extra count columns created by pandas groupby spyf8 1 1,889 Feb-10-2021, 09:19 AM
Last Post: Naheed
  Combine groupby() and shift() in pandas rama27 0 2,959 Nov-17-2020, 09:49 PM
Last Post: rama27
  python pandas: diff between 2 dates in a groupby bluedragon 0 2,853 Mar-25-2020, 04:18 PM
Last Post: bluedragon
  Pandas copying wrong values vmarg 2 1,742 Jan-06-2020, 09:45 AM
Last Post: vmarg
  itertuples, datetime, pandas, groupby, in range karlito 0 1,968 Nov-29-2019, 11:35 AM
Last Post: karlito
  sort values of a column pandas karlito 2 1,845 Oct-22-2019, 06:11 AM
Last Post: karlito

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020