Python Forum

Full Version: Pandas : How to create an algorithm that helps me improve results and creating new co
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Link to question on StackOverflow
in the stackoverflow Yaloa_21 understood what i want to make but i always get errors
https://stackoverflow.com/questions/7157...5_71593640

Question about pandas
it's a little bit complicated , i have this dataframe :

ID           TimeandDate      Date        Time
10   2020-08-07 07:40:09  2022-08-07   07:40:09
10   2020-08-07 08:50:00  2022-08-07   08:50:00
10   2020-08-07 12:40:09  2022-08-07   12:40:09
10   2020-08-08 07:40:09  2022-08-08   07:40:09
10   2020-08-08 17:40:09  2022-08-08   17:40:09
12   2020-08-07 08:03:09  2022-08-07   08:03:09
12   2020-08-07 10:40:09  2022-08-07   10:40:09
12   2020-08-07 14:40:09  2022-08-07   14:40:09
12   2020-08-07 16:40:09  2022-08-07   16:40:09
13   2020-08-07 09:22:45  2022-08-07   09:22:45
13   2020-08-07 17:57:06  2022-08-07   17:57:06
i want to create new dataframe with 2 new columns the first one is df["Check-in"] , as you can see my data doesnt have any indicator to show what time the id has checked in , so i will suppose that the first time for every id is a check-in , and the next row is a check-out and will be inserted in df["Check-out"] , also if a check-in doesnt have a check-out time it has to be registred as the check-out for the previous check-out of the same day

i tried to do this but i'm afraid its not efficient because it shows the first and last one imagine if ID=13 has entered at 07:40:09 and the he check out at 08:40:09 , later that day he returns at 19:20:00 and leave in the next 10 minutes 19:30:00 if i do that fonction it will show that he worked for 12 hours

group = df.groupby(['ID', 'Date'])
def TimeDifference(df):
    in = df['TimeandDate'].min()
    out = df['TimeandDate'].max()
    df2 = p.DataFrame([in-out], columns=['TimeDiff'])
    return df2
group.apply(TimeDifference) 
Desired Result

Output:
ID Date Check-in Check-out 10 2020-08-07 07:40:09 12:40:09 10 2020-08-08 07:40:09 17:40:09 12 2020-08-07 08:03:09 10:40:09 12 2020-08-07 14:40:09 16:40:09 13 2020-08-07 09:22:45 17:57:06
If you get errors, show the error traceback (within BBcode error tags), complete and unaltered.
(Apr-03-2022, 10:38 PM)Larz60+ Wrote: [ -> ]If you get errors, show the error traceback (within BBcode error tags), complete and unaltered.
hello , when i tried this function in stack overflow:
new_col = []
for i in df.ID.unique():
    for d in df.Date.unique():
        p = df.loc[(df.ID==i)&(df.Date==d)]
        suffix = sorted(list(range(1,len(p)))*2)[:len(p)]
        if len(suffix)%2!=0 and len(suffix)>1:
            suffix[-2]=np.nan
            suffix[-1]-=1
        new_col.extend(suffix)

df['new'] = new_col
df.dropna().groupby(['ID','Date','new'], as_index=False).agg({'Time':[min,max]}).drop('new', axis=1, level=0)
i always get this error
Error:
ValueError: Length of values (2623) does not match length of index (2667)
You're getting a bad value exception, but the error message does not look like a complete message.
(Apr-03-2022, 10:47 PM)Larz60+ Wrote: [ -> ]You're getting a bad value exception, but the error message does not look like a complete message.
This is the full error message :
Error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-338-9b9ac9bdf42b> in <module> 10 new_col.extend(suffix) 11 ---> 12 df['new'] = new_col ~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value) 3038 else: 3039 # set column -> 3040 self._set_item(key, value) 3041 3042 def _setitem_slice(self, key: slice, value): ~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value) 3114 """ 3115 self._ensure_valid_index(value) -> 3116 value = self._sanitize_column(key, value) 3117 NDFrame._set_item(self, key, value) 3118 ~\anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast) 3762 3763 # turn me into an ndarray -> 3764 value = sanitize_index(value, self.index) 3765 if not isinstance(value, (np.ndarray, Index)): 3766 if isinstance(value, list) and len(value) > 0: ~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index) 745 """ 746 if len(data) != len(index): --> 747 raise ValueError( 748 "Length of values " 749 f"({len(data)}) " ValueError: Length of values (2623) does not match length of index (2667)
Not enough provided, The error is from pandas, but you are not showing your pandas dataframe.
you are trying to use your dataframe prior to it's being created (begining with loop line 2).
You can do this if you move your loop into a function, so long as the function is called after the dataframe has been defined.
(Apr-04-2022, 04:05 PM)Larz60+ Wrote: [ -> ]you are trying to use your dataframe prior to it's being created (begining with loop line 2).
You can do this if you move your loop into a function, so long as the function is called after the dataframe has been defined.

Sorry for the late reply , i tried moving it into a function and that didnt work too , based on what i said and what i have tried can you tell if its possible to get the desired results ?
please show your code attempt