Hello,
im trying to understand some piece of code written by guy with 10+ experience of coding,
he won Data science competition and put his code here:
https://github.com/drivendataorg/power-l...process.py
Can anybody explain what this function is intended to do pls?
(sorry if question confuse you, im newbie)
im trying to understand some piece of code written by guy with 10+ experience of coding,
he won Data science competition and put his code here:
https://github.com/drivendataorg/power-l...process.py
Can anybody explain what this function is intended to do pls?
(sorry if question confuse you, im newbie)
def get_aggregates(df, TestTimestamp, period, target_col, cols, func_list, offset_name, col_values, group_cache, noval_name = ''): # prtime('cols = ', cols) start = time.time() # print('gagc = ', get_aggregates.gb_cache) if (tuple(cols), target_col, col_values) in group_cache: subset = group_cache[(tuple(cols), target_col, col_values)] else: if tuple(cols) in get_aggregates.gb_cache: gb = get_aggregates.gb_cache[tuple(cols)] else: if len(cols): gb = df.groupby(cols)['Value','Temperature'] get_aggregates.gb_cache[tuple(cols)] = gb if len(cols): if col_values in gb.groups.keys(): subset = gb.get_group(col_values)[target_col] #.set_index('Timestamp') # Slice with the current values in the corresponding columns else: subset = df.iloc[0:0] # empty slice else: subset = df # No slicing, using all data group_cache[(tuple(cols), target_col, col_values)] = subset get_aggregates.times['get_group'] += time.time()-start start = time.time()