Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 can anybody explain what such function doing
#1
Hello,
im trying to understand some piece of code written by guy with 10+ experience of coding,
he won Data science competition and put his code here:
https://github.com/drivendataorg/power-l...process.py

Can anybody explain what this function is intended to do pls?
(sorry if question confuse you, im newbie)
def get_aggregates(df, TestTimestamp, period, target_col, cols, func_list, offset_name, col_values, group_cache, noval_name = ''):
  
#  prtime('cols = ', cols)
  start = time.time()

#  print('gagc = ', get_aggregates.gb_cache)
  if (tuple(cols), target_col, col_values) in group_cache:
    subset = group_cache[(tuple(cols), target_col, col_values)]
  else:
    if tuple(cols) in get_aggregates.gb_cache:
      gb = get_aggregates.gb_cache[tuple(cols)]
    else:
      if len(cols):
        gb = df.groupby(cols)['Value','Temperature']
        get_aggregates.gb_cache[tuple(cols)] = gb
    
    if len(cols):
      if col_values in gb.groups.keys():
        subset = gb.get_group(col_values)[target_col] #.set_index('Timestamp')  # Slice with the current values in the corresponding columns
      else:
        subset = df.iloc[0:0] # empty slice
    else:
      subset = df # No slicing, using all data
    group_cache[(tuple(cols), target_col, col_values)] = subset
  get_aggregates.times['get_group'] += time.time()-start
start = time.time()
Quote
#2
1st read script comments, as the author has more knowledge about his code then we do:
Output:
# Calculating historical aggregates # df - source dataframe (train set) # TestTimestamp - start of test period, no data at this point or beyond is used # period - amount of time before TestTimestamp used to calculate aggregetes # target col - column to calculate averages (can be Value, Temperature, ...) # cols - columns to group by (i.e. we are getting aggregate values for the same values in these columns in the past # col_values - current values in this columns (for example, current time and day of week) # Notice : in its current state this function relies on Timestamp values being sorted (ascending) within each group, # so can't be used for aggregates over different SiteIds
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  please explain the 23rd line kaushalkumarroy 1 307 Jan-03-2019, 01:16 PM
Last Post: Gribouillis
  Can someone explain how does svr_rbf.predict(dates) work? j2ee 0 811 Feb-22-2018, 06:50 PM
Last Post: j2ee

Forum Jump:


Users browsing this thread: 1 Guest(s)