Multiprocessing on python

sawtooth500 · Apr-01-2024, 04:41 AM

So here is the crux of my issue:

For example, I have a dataframe with 1.8 million input rows. However, from that I extrapolate it into 62.1 million rows from which I need to find a weighted average. So really I'm crunching 62.1 million rows, and this takes 23-ish minutes, for example.

As a simplication, my dataframe df has two cols 'A' and 'B' filled with numbers and and I need to calculate the average of the numbers in col A weighted by the numbers in col B.

But... I don't need to find the weighted average of the original 1.8 million rows. I need to find the weighted averages of overlapping sets of rows - which in example total to 62.1 million rows.

For example - I would take rows

0-2832
672-3293
1189-4102
1382-4204
2902 - 4680
and so on....

I have an algorithm that determines the relevant row numbers for each subset. There is no consistent pattern of a fixed increment or anything like that with the row numbers.

So it starts with a while loop (this is the thing that is probably why the time to process is so long)

while condition:
   boolean indexing to create a new dataframe that contains my relevant subset, call it tempdf
   weighted_avg = (tempdf['A'] * tempdf['B']).sum() / tempdf['B'].sum()
   append the weighted average as a new row into a results dataframe called resultsdf

So, as you see, this is how I turn 1.8 million rows into 62.1 million rows to process, because there is a lot of overlap in the subset dataframe, and because of the math involved in calculated a weighted average, and given how there is no simple incremental pattern in the indices that are used to create the subset dataframe, I don't know how to do this any way other than with a while loop that takes a very long time to cycle through.

So, any better way to do it?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to run existing python script parallel using multiprocessing	lravikumarvsp	3	4,993	May-24-2018, 05:23 AM Last Post: lravikumarvsp

Multiprocessing on python

User Panel Messages

Announcements