DataFrame .xlsx max() - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: DataFrame .xlsx max() (/thread-36162.html) |
DataFrame .xlsx max() - bnadir55 - Jan-22-2022 Friends! I have a data frame from .xslx that return 4 columns, order id, order date, order requester, order urgency, I want to return the max() order date by order id, BUT to retrieve back the entire columns in it also e.g.: order id | max(order date) | order requester | order urgency tried this but I retrieve only 2 columns : df = df.groupby(['order id'],as_index=False)[['order date']].max() appreciate your support :-) RE: DataFrame .xlsx max() - snippsat - Jan-22-2022 Try: df = df[df.groupby(['order id'],as_index=False)[['order date']].max()] RE: DataFrame .xlsx max() - bnadir55 - Jan-22-2022 Thx, getting this error : ValueError: Boolean array expected for the condition, not object RE: DataFrame .xlsx max() - snippsat - Jan-22-2022 You should post a sample of the DataFrame so can run it,then it easier to test stuff out. If i make example,so can run your line of code. import pandas as pd mydataset = { 'order id': [3, 7, 2], 'order date': ['2020-01-01', '2020-01-02', '2020-01-03'], 'order requester': ['a', 'b', 'c'], 'order urgency': ['slow', 'fast', 'now'] } df_1 = pd.DataFrame(mydataset) df_1["order date"] = pd.to_datetime(df_1["order date"]) df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max() print(df_1) print('-' * 40) print(df_2) So this is just guess of your data,can try combine back group date(may not be what you want).>>> df_2.combine_first(df_1) order date order id order requester order urgency 0 2020-01-03 2 a slow 1 2020-01-01 3 b fast 2 2020-01-02 7 c now RE: DataFrame .xlsx max() - bnadir55 - Jan-22-2022 (Jan-22-2022, 11:47 AM)snippsat Wrote: Try: (Jan-22-2022, 05:46 PM)snippsat Wrote: You should post a sample of the DataFrame so can run it,then it easier to test stuff out. Thx, here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table) see in attachment : RE: DataFrame .xlsx max() - snippsat - Jan-22-2022 (Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result. Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out. So i have to make the DataFrame to test stuff out. import pandas as pd from io import StringIO data = StringIO('''\ order id,order date,order requester,order urgency 1,10/1/2022,James,A 1,10/2/2022,Don,A 1,10/3/2022,Mike,B 2,10/4/2022,Mike,B 2,10/5/2022,Don,B 2,10/6/2022,James,B 3,10/7/2022,James,A 3,10/8/2022,Don,A 3,10/9/2022,Don,C 4,10/10/2022,Mike,C''') df_1 = pd.read_csv(data, sep=',') df_1["order date"] = pd.to_datetime(df_1["order date"]) df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()So now can merge DataFrames based on df_2 groupby result. This match your wanted result in image. >>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner') >>> df_3 order id order date order requester order urgency 0 1 2022-10-03 Mike B 1 2 2022-10-06 James B 2 3 2022-10-09 Don C 3 4 2022-10-10 Mike C RE: DataFrame .xlsx max() - bnadir55 - Jan-23-2022 (Jan-22-2022, 10:52 PM)snippsat Wrote:(Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result. Got you, thank you! |