![]() |
multiprocessing - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: multiprocessing (/thread-9416.html) |
multiprocessing - srik - Apr-07-2018 I am given a data set and I need to analyze the statistics of a column based on the groups formed from other column unique values. I could do it using groupby of pandas. But I want to use multiprocessing Pool map. The example is as below. A B C 2 3 4 2 5 3 2 3 5 2 7 9 2 3 10 3 4 23 2 7 4 Based on A and B combination unique values, I need to get mean of column C. #df.groupby(['A', 'B'])['C'].mean()#worked, but want to use Pool.map(). I am unable to get the idea of solving it. Please give some pointers on this. RE: multiprocessing - woooee - Apr-07-2018 What exactly don't you understand and what have you tried that didn't work. RE: multiprocessing - srik - Apr-09-2018 The way I tried is as follows: 1) get unique combinations of A and B ls1 = df[['A', 'B']].drop_duplicates().values.tolist()2) get C values for every combination in a list s = [np.array([tuple(l) == tuple(t) for t in df[['A', 'B']].itertuples(index = False)]) for l in ls1] val_C = [list(df.loc[a1, 'C'].values) for a1 in s]3) And apply Pool.map on the list of lists. My question is 1) Is this a good approach in solving it? 2) My step 2 seems to be taking huge time. (How to optimize it???) |