Posts: 15
Threads: 7
Joined: Jan 2022
Friends!
I have a data frame from .xslx that return 4 columns, order id, order date, order requester, order urgency,
I want to return the max() order date by order id, BUT to retrieve back the entire columns in it also e.g.:
order id | max(order date) | order requester | order urgency
tried this but I retrieve only 2 columns :
df = df.groupby(['order id'],as_index=False)[['order date']].max()
appreciate your support :-)
Posts: 7,320
Threads: 123
Joined: Sep 2016
Try:
df = df[df.groupby(['order id'],as_index=False)[['order date']].max()]
Posts: 15
Threads: 7
Joined: Jan 2022
Thx, getting this error : ValueError: Boolean array expected for the condition, not object
Posts: 7,320
Threads: 123
Joined: Sep 2016
You should post a sample of the DataFrame so can run it,then it easier to test stuff out.
If i make example,so can run your line of code.
import pandas as pd
mydataset = {
'order id': [3, 7, 2],
'order date': ['2020-01-01', '2020-01-02', '2020-01-03'],
'order requester': ['a', 'b', 'c'],
'order urgency': ['slow', 'fast', 'now']
}
df_1 = pd.DataFrame(mydataset)
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()
print(df_1)
print('-' * 40)
print(df_2) Output: order id order date order requester order urgency
0 3 2020-01-01 a slow
1 7 2020-01-02 b fast
2 2 2020-01-03 c now
----------------------------------------
order id order date
0 2 2020-01-03
1 3 2020-01-01
2 7 2020-01-02
So this is just guess of your data,can try combine back group date(may not be what you want).
>>> df_2.combine_first(df_1)
order date order id order requester order urgency
0 2020-01-03 2 a slow
1 2020-01-01 3 b fast
2 2020-01-02 7 c now
Posts: 15
Threads: 7
Joined: Jan 2022
Jan-22-2022, 06:28 PM
(This post was last modified: Jan-22-2022, 06:28 PM by bnadir55.)
(Jan-22-2022, 11:47 AM)snippsat Wrote: Try:
df = df[df.groupby(['order id'],as_index=False)[['order date']].max()]
(Jan-22-2022, 05:46 PM)snippsat Wrote: You should post a sample of the DataFrame so can run it,then it easier to test stuff out.
If i make example,so can run your line of code.
import pandas as pd
mydataset = {
'order id': [3, 7, 2],
'order date': ['2020-01-01', '2020-01-02', '2020-01-03'],
'order requester': ['a', 'b', 'c'],
'order urgency': ['slow', 'fast', 'now']
}
df_1 = pd.DataFrame(mydataset)
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max()
print(df_1)
print('-' * 40)
print(df_2) Output: order id order date order requester order urgency
0 3 2020-01-01 a slow
1 7 2020-01-02 b fast
2 2 2020-01-03 c now
----------------------------------------
order id order date
0 2 2020-01-03
1 3 2020-01-01
2 7 2020-01-02
So this is just guess of your data,can try combine back group date(may not be what you want).
>>> df_2.combine_first(df_1)
order date order id order requester order urgency
0 2020-01-03 2 a slow
1 2020-01-01 3 b fast
2 2020-01-02 7 c now
Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment :
Attached Files
Thumbnail(s)
Posts: 7,320
Threads: 123
Joined: Sep 2016
Jan-22-2022, 10:52 PM
(This post was last modified: Jan-22-2022, 10:53 PM by snippsat.)
(Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment : Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result.
Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out.
So i have to make the DataFrame to test stuff out.
import pandas as pd
from io import StringIO
data = StringIO('''\
order id,order date,order requester,order urgency
1,10/1/2022,James,A
1,10/2/2022,Don,A
1,10/3/2022,Mike,B
2,10/4/2022,Mike,B
2,10/5/2022,Don,B
2,10/6/2022,James,B
3,10/7/2022,James,A
3,10/8/2022,Don,A
3,10/9/2022,Don,C
4,10/10/2022,Mike,C''')
df_1 = pd.read_csv(data, sep=',')
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max() So now can merge DataFrames based on df_2 groupby result.
This match your wanted result in image.
>>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner')
>>> df_3
order id order date order requester order urgency
0 1 2022-10-03 Mike B
1 2 2022-10-06 James B
2 3 2022-10-09 Don C
3 4 2022-10-10 Mike C
Posts: 15
Threads: 7
Joined: Jan 2022
(Jan-22-2022, 10:52 PM)snippsat Wrote: (Jan-22-2022, 06:28 PM)bnadir55 Wrote: Thx,
here is what I need return the left table with all the rows and columns that carry the max(Order_date) grouped by order_ID (see results on the right table)
see in attachment : Yes,but what i mean that you should poste the a working Dataframe sample,the add image as additional info about wanted result.
Most of here work with Pandas sporadically or not all,then is difficult to answer if can not teste stuff out.
So i have to make the DataFrame to test stuff out.
import pandas as pd
from io import StringIO
data = StringIO('''\
order id,order date,order requester,order urgency
1,10/1/2022,James,A
1,10/2/2022,Don,A
1,10/3/2022,Mike,B
2,10/4/2022,Mike,B
2,10/5/2022,Don,B
2,10/6/2022,James,B
3,10/7/2022,James,A
3,10/8/2022,Don,A
3,10/9/2022,Don,C
4,10/10/2022,Mike,C''')
df_1 = pd.read_csv(data, sep=',')
df_1["order date"] = pd.to_datetime(df_1["order date"])
df_2 = df_1.groupby(['order id'],as_index=False)[['order date']].max() So now can merge DataFrames based on df_2 groupby result.
This match your wanted result in image.
>>> df_3 = df_1.merge(df_2, on=['order id', 'order date'], how='inner')
>>> df_3
order id order date order requester order urgency
0 1 2022-10-03 Mike B
1 2 2022-10-06 James B
2 3 2022-10-09 Don C
3 4 2022-10-10 Mike C
Got you, thank you!
|