Python Forum
can anybody explain what such function doing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
can anybody explain what such function doing
#1
Hello,
im trying to understand some piece of code written by guy with 10+ experience of coding,
he won Data science competition and put his code here:
https://github.com/drivendataorg/power-l...process.py

Can anybody explain what this function is intended to do pls?
(sorry if question confuse you, im newbie)
def get_aggregates(df, TestTimestamp, period, target_col, cols, func_list, offset_name, col_values, group_cache, noval_name = ''):
  
#  prtime('cols = ', cols)
  start = time.time()

#  print('gagc = ', get_aggregates.gb_cache)
  if (tuple(cols), target_col, col_values) in group_cache:
    subset = group_cache[(tuple(cols), target_col, col_values)]
  else:
    if tuple(cols) in get_aggregates.gb_cache:
      gb = get_aggregates.gb_cache[tuple(cols)]
    else:
      if len(cols):
        gb = df.groupby(cols)['Value','Temperature']
        get_aggregates.gb_cache[tuple(cols)] = gb
    
    if len(cols):
      if col_values in gb.groups.keys():
        subset = gb.get_group(col_values)[target_col] #.set_index('Timestamp')  # Slice with the current values in the corresponding columns
      else:
        subset = df.iloc[0:0] # empty slice
    else:
      subset = df # No slicing, using all data
    group_cache[(tuple(cols), target_col, col_values)] = subset
  get_aggregates.times['get_group'] += time.time()-start
start = time.time()
Reply
#2
1st read script comments, as the author has more knowledge about his code then we do:
Output:
# Calculating historical aggregates # df - source dataframe (train set) # TestTimestamp - start of test period, no data at this point or beyond is used # period - amount of time before TestTimestamp used to calculate aggregetes # target col - column to calculate averages (can be Value, Temperature, ...) # cols - columns to group by (i.e. we are getting aggregate values for the same values in these columns in the past # col_values - current values in this columns (for example, current time and day of week) # Notice : in its current state this function relies on Timestamp values being sorted (ascending) within each group, # so can't be used for aggregates over different SiteIds
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020