Pandas + Groupby + Filter unique values - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Pandas + Groupby + Filter unique values (/thread-27646.html) |
Pandas + Groupby + Filter unique values - JosepMaria - Jun-15-2020 I have this dataset: df=pd.DataFrame({'user':[1,1,2,2,2,3,3,3,3,3,4,4], 'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-20'], 'type':['a','b','a','x','b','a','x','a','x','b','a','b']})Which gives me: user date type 1 1995-09-01 a 1 1995-09-02 b 2 1995-10-03 a 2 1995-10-04 x 2 1995-10-05 b 3 1995-11-07 a 3 1995-11-08 x 3 1995-11-09 a 3 1995-11-10 x 3 1995-11-15 b 4 1995-12-18 a 4 1995-12-20 band I want to get a new column, with dates based on a values. When a user has more than an value type a value , the date of the oldest a value of this user should be selected to show on the new column. For example, user 3 has several a values on the type column. Thus, on the a_type_date column, the eldest date for the a value is chosen. Here is the desired output: user date type a_type_date 1 1995-09-01 a 1995-09-01 1 1995-09-02 b 1995-09-01 2 1995-10-03 a 1995-10-03 2 1995-10-04 x 1995-10-03 2 1995-10-05 b 1995-10-03 3 1995-11-07 a 1995-11-07 3 1995-11-08 x 1995-11-07 3 1995-11-09 a 1995-11-07 3 1995-11-10 x 1995-11-07 3 1995-11-15 b 1995-11-07 4 1995-12-18 a 1995-12-18 4 1995-12-20 b 1995-12-18Any help will be highly appreciated. RE: Pandas + Groupby + Filter unique values - JosepMaria - Jun-15-2020 1. First, convert the date column into a datetime() index: df.date = pd.to_datetime(df.date)2. Next use min() and list comprehension to create a new column containing your earliest (use max() for latest) type a dates per user as follows: df['a_type_date'] = [min(df[(df.user==i)&(df.type=='a')]['date']).date() for i in df.user] |