DataFrame .xlsx max()

bnadir55 · Jan-22-2022, 10:55 AM

Friends!
I have a data frame from .xslx that return 4 columns, order id, order date, order requester, order urgency,
I want to return the max() order date by order id, BUT to retrieve back the entire columns in it also e.g.:
order id | max(order date) | order requester | order urgency
tried this but I retrieve only 2 columns :

df = df.groupby(['order id'],as_index=False)[['order date']].max()

appreciate your support :-)

***snippsat*** · Jan-22-2022, 11:47 AM

Try:

df = df[df.groupby(['order id'],as_index=False)[['order date']].max()]

bnadir55 · Jan-22-2022, 11:53 AM

Thx, getting this error : ValueError: Boolean array expected for the condition, not object

***snippsat*** · Jan-22-2022, 05:46 PM

You should post a sample of the DataFrame so can run it,then it easier to test stuff out.
If i make example,so can run your line of code.

import pandas as pd

mydataset = {
  'order id': [3, 7, 2],
  'order date': ['2020-01-01', '2020-01-02', '2020-01-03'],
  'order requester': ['a', 'b', 'c'],
  'order urgency': ['slow', 'fast', 'now']
}

df_1 = pd.DataFrame(mydataset)
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()

print(df_1)
print('-' * 40)
print(df_2)

Output:   order id order date order requester order urgency
0         3 2020-01-01               a           slow
1         7 2020-01-02               b           fast
2         2 2020-01-03               c            now
----------------------------------------
   order id order date
0         2 2020-01-03
1         3 2020-01-01
2         7 2020-01-02

So this is just guess of your data,can try combine back group date(may not be what you want).

>>> df_2.combine_first(df_1)
  order date  order id order requester order urgency
0 2020-01-03         2               a          slow
1 2020-01-01         3               b          fast
2 2020-01-02         7               c           now

bnadir55 · (This post was last modified: Jan-22-2022, 06:28 PM by bnadir55.)

(Jan-22-2022, 11:47 AM)snippsat Wrote: Try:
df = df[df.groupby(['order id'],as_index=False)[['order date']].max()]

(Jan-22-2022, 05:46 PM)snippsat Wrote: You should post a sample of the DataFrame so can run it,then it easier to test stuff out.
If i make example,so can run your line of code.

import pandas as pd

mydataset = {
  'order id': [3, 7, 2],
  'order date': ['2020-01-01', '2020-01-02', '2020-01-03'],
  'order requester': ['a', 'b', 'c'],
  'order urgency': ['slow', 'fast', 'now']
}

df_1 = pd.DataFrame(mydataset)
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()

print(df_1)
print('-' * 40)
print(df_2)

Output:   order id order date order requester order urgency
0         3 2020-01-01               a           slow
1         7 2020-01-02               b           fast
2         2 2020-01-03               c            now
----------------------------------------
   order id order date
0         2 2020-01-03
1         3 2020-01-01
2         7 2020-01-02

So this is just guess of your data,can try combine back group date(may not be what you want).

>>> df_2.combine_first(df_1)
  order date  order id order requester order urgency
0 2020-01-03         2               a          slow
1 2020-01-01         3               b          fast
2 2020-01-02         7               c           now

Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :

***snippsat*** · (This post was last modified: Jan-22-2022, 10:53 PM by snippsat.)

(Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :

Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result.
Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out.
So i have to make the DataFrame to test stuff out.

import pandas as pd
from io import StringIO

data = StringIO('''\
order id,order date,order requester,order urgency
1,10/1/2022,James,A
1,10/2/2022,Don,A
1,10/3/2022,Mike,B
2,10/4/2022,Mike,B
2,10/5/2022,Don,B
2,10/6/2022,James,B
3,10/7/2022,James,A
3,10/8/2022,Don,A
3,10/9/2022,Don,C
4,10/10/2022,Mike,C''')

df_1 = pd.read_csv(data, sep=',')
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()

So now can merge DataFrames based on df_2 groupby result.
This match your wanted result in image.

>>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner')
>>> df_3
   order id order date order requester order urgency
0         1 2022-10-03            Mike             B
1         2 2022-10-06           James             B
2         3 2022-10-09             Don             C
3         4 2022-10-10            Mike             C

bnadir55 · Jan-23-2022, 10:05 AM

(Jan-22-2022, 10:52 PM)snippsat Wrote:
(Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :
Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result.
Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out.
So i have to make the DataFrame to test stuff out.
import pandas as pd
from io import StringIO

data = StringIO('''\
order id,order date,order requester,order urgency
1,10/1/2022,James,A
1,10/2/2022,Don,A
1,10/3/2022,Mike,B
2,10/4/2022,Mike,B
2,10/5/2022,Don,B
2,10/6/2022,James,B
3,10/7/2022,James,A
3,10/8/2022,Don,A
3,10/9/2022,Don,C
4,10/10/2022,Mike,C''')

df_1 = pd.read_csv(data, sep=',')
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()
So now can merge DataFrames based on df_2 groupby result.
This match your wanted result in image.
>>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner')
>>> df_3
   order id order date order requester order urgency
0         1 2022-10-03            Mike             B
1         2 2022-10-06           James             B
2         3 2022-10-09             Don             C
3         4 2022-10-10            Mike             C

Got you, thank you!

DataFrame .xlsx max()

User Panel Messages

Announcements