Python Forum
DataFrame .xlsx max() - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: DataFrame .xlsx max() (/thread-36162.html)



DataFrame .xlsx max() - bnadir55 - Jan-22-2022

Friends!
I have a data frame from .xslx that return 4 columns, order id, order date, order requester, order urgency,
I want to return the max() order date by order id, BUT to retrieve back the entire columns in it also e.g.:
order id | max(order date) | order requester | order urgency
tried this but I retrieve only 2 columns :

df = df.groupby(['order id'],as_index=False)[['order date']].max()

appreciate your support :-)


RE: DataFrame .xlsx max() - snippsat - Jan-22-2022

Try:
df = df[df.groupby(['order id'],as_index=False)[['order date']].max()]



RE: DataFrame .xlsx max() - bnadir55 - Jan-22-2022

Thx, getting this error : ValueError: Boolean array expected for the condition, not object


RE: DataFrame .xlsx max() - snippsat - Jan-22-2022

You should post a sample of the DataFrame so can run it,then it easier to test stuff out.
If i make example,so can run your line of code.
import pandas as pd

mydataset = {
  'order id': [3, 7, 2],
  'order date': ['2020-01-01', '2020-01-02', '2020-01-03'],
  'order requester': ['a', 'b', 'c'],
  'order urgency': ['slow', 'fast', 'now']
}

df_1 = pd.DataFrame(mydataset)
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()

print(df_1)
print('-' * 40)
print(df_2)
Output:
order id order date order requester order urgency 0 3 2020-01-01 a slow 1 7 2020-01-02 b fast 2 2 2020-01-03 c now ---------------------------------------- order id order date 0 2 2020-01-03 1 3 2020-01-01 2 7 2020-01-02
So this is just guess of your data,can try combine back group date(may not be what you want).
>>> df_2.combine_first(df_1)
  order date  order id order requester order urgency
0 2020-01-03         2               a          slow
1 2020-01-01         3               b          fast
2 2020-01-02         7               c           now



RE: DataFrame .xlsx max() - bnadir55 - Jan-22-2022

(Jan-22-2022, 11:47 AM)snippsat Wrote: Try:
df = df[df.groupby(['order id'],as_index=False)[['order date']].max()]

(Jan-22-2022, 05:46 PM)snippsat Wrote: You should post a sample of the DataFrame so can run it,then it easier to test stuff out.
If i make example,so can run your line of code.
import pandas as pd

mydataset = {
  'order id': [3, 7, 2],
  'order date': ['2020-01-01', '2020-01-02', '2020-01-03'],
  'order requester': ['a', 'b', 'c'],
  'order urgency': ['slow', 'fast', 'now']
}

df_1 = pd.DataFrame(mydataset)
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()

print(df_1)
print('-' * 40)
print(df_2)
Output:
order id order date order requester order urgency 0 3 2020-01-01 a slow 1 7 2020-01-02 b fast 2 2 2020-01-03 c now ---------------------------------------- order id order date 0 2 2020-01-03 1 3 2020-01-01 2 7 2020-01-02
So this is just guess of your data,can try combine back group date(may not be what you want).
>>> df_2.combine_first(df_1)
  order date  order id order requester order urgency
0 2020-01-03         2               a          slow
1 2020-01-01         3               b          fast
2 2020-01-02         7               c           now

Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :


RE: DataFrame .xlsx max() - snippsat - Jan-22-2022

(Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :
Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result.
Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out.
So i have to make the DataFrame to test stuff out.
import pandas as pd
from io import StringIO

data = StringIO('''\
order id,order date,order requester,order urgency
1,10/1/2022,James,A
1,10/2/2022,Don,A
1,10/3/2022,Mike,B
2,10/4/2022,Mike,B
2,10/5/2022,Don,B
2,10/6/2022,James,B
3,10/7/2022,James,A
3,10/8/2022,Don,A
3,10/9/2022,Don,C
4,10/10/2022,Mike,C''')

df_1 = pd.read_csv(data, sep=',')
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()
So now can merge DataFrames based on df_2 groupby result.
This match your wanted result in image.
>>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner')
>>> df_3
   order id order date order requester order urgency
0         1 2022-10-03            Mike             B
1         2 2022-10-06           James             B
2         3 2022-10-09             Don             C
3         4 2022-10-10            Mike             C



RE: DataFrame .xlsx max() - bnadir55 - Jan-23-2022

(Jan-22-2022, 10:52 PM)snippsat Wrote:
(Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :
Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result.
Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out.
So i have to make the DataFrame to test stuff out.
import pandas as pd
from io import StringIO

data = StringIO('''\
order id,order date,order requester,order urgency
1,10/1/2022,James,A
1,10/2/2022,Don,A
1,10/3/2022,Mike,B
2,10/4/2022,Mike,B
2,10/5/2022,Don,B
2,10/6/2022,James,B
3,10/7/2022,James,A
3,10/8/2022,Don,A
3,10/9/2022,Don,C
4,10/10/2022,Mike,C''')

df_1 = pd.read_csv(data, sep=',')
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()
So now can merge DataFrames based on df_2 groupby result.
This match your wanted result in image.
>>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner')
>>> df_3
   order id order date order requester order urgency
0         1 2022-10-03            Mike             B
1         2 2022-10-06           James             B
2         3 2022-10-09             Don             C
3         4 2022-10-10            Mike             C

Got you, thank you!