Jun-30-2023, 02:45 PM
(This post was last modified: Jun-30-2023, 03:43 PM by Gribouillis.)
I created a program to aggregate a large dataframe over one variable - userid. The program executes a groupby to calculate the mean, min and max of 10 variables for each userid. I've enclosed a proxy for this code. First, the code creates the dataframe. Second, it aggregates over userid. The code ran in 20 minutes. I would like to optimize this code by multithreading.
from datetime import datetime import numpy as np import random import pandas as pd print('Initial time:',datetime.now().strftime("%H:%M:%S")) def fun_user_id(start, end, step): num = np.linspace(start, end,(end-start) *int(1/step)+1).tolist() return [round(i, 0) for i in num] def fun_rand_num(): return list(map(lambda x: random.randint(300,800), range(1, 100000001))) userid=fun_user_id(1,100000001,.5) var1=fun_rand_num() var2=fun_rand_num() var3=fun_rand_num() var4=fun_rand_num() var5=fun_rand_num() var6=fun_rand_num() var7=fun_rand_num() var8=fun_rand_num() var9=fun_rand_num() var10=fun_rand_num() df = pd.DataFrame(list(zip(userid,var1, var2,var3,var4,var5,var6,var7,var8,var9,var10)), columns =['userid','var1', 'var2','var3','var4','var5','var6', 'var7','var8','var9','var10']) varlistdic= {"var1" : ["mean","max","min"], "var2" : ["mean","max","min"], "var3" : ["mean","max","min"], "var4" : ["mean","max","min"], "var5" : ["mean","max","min"], "var6" : ["mean","max","min"], "var7" : ["mean","max","min"], "var8" : ["mean","max","min"], "var9" : ["mean","max","min"], "var10" : ["mean","max","min"], } gr=df.groupby(['userid']) df_sum=gr.agg(varlistdic) df_sum=df_sum.pipe(lambda x: x.set_axis(x.columns.map('_'.join),axis=1)) df_sum.reset_index(inplace=True) print('End Time:',datetime.now().strftime("%H:%M:%S"))
Gribouillis write Jun-30-2023, 03:43 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.