Python Forum
Multiprocessing on python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Multiprocessing on python
#4
So here is the crux of my issue:

For example, I have a dataframe with 1.8 million input rows. However, from that I extrapolate it into 62.1 million rows from which I need to find a weighted average. So really I'm crunching 62.1 million rows, and this takes 23-ish minutes, for example.

As a simplication, my dataframe df has two cols 'A' and 'B' filled with numbers and and I need to calculate the average of the numbers in col A weighted by the numbers in col B.

But... I don't need to find the weighted average of the original 1.8 million rows. I need to find the weighted averages of overlapping sets of rows - which in example total to 62.1 million rows.

For example - I would take rows

0-2832
672-3293
1189-4102
1382-4204
2902 - 4680
and so on....

I have an algorithm that determines the relevant row numbers for each subset. There is no consistent pattern of a fixed increment or anything like that with the row numbers.

So it starts with a while loop (this is the thing that is probably why the time to process is so long)

while condition:
   boolean indexing to create a new dataframe that contains my relevant subset, call it tempdf
   weighted_avg = (tempdf['A'] * tempdf['B']).sum() / tempdf['B'].sum()
   append the weighted average as a new row into a results dataframe called resultsdf
So, as you see, this is how I turn 1.8 million rows into 62.1 million rows to process, because there is a lot of overlap in the subset dataframe, and because of the math involved in calculated a weighted average, and given how there is no simple incremental pattern in the indices that are used to create the subset dataframe, I don't know how to do this any way other than with a while loop that takes a very long time to cycle through.

So, any better way to do it?
Reply


Messages In This Thread
Multiprocessing on python - by sawtooth500 - Apr-01-2024, 01:31 AM
RE: Multiprocessing on python - by deanhystad - Apr-01-2024, 01:57 AM
RE: Multiprocessing on python - by sawtooth500 - Apr-01-2024, 02:24 AM
RE: Multiprocessing on python - by sawtooth500 - Apr-01-2024, 04:41 AM
RE: Multiprocessing on python - by deanhystad - Apr-01-2024, 09:51 AM
RE: Multiprocessing on python - by sawtooth500 - Apr-01-2024, 05:12 PM
RE: Multiprocessing on python - by deanhystad - Apr-01-2024, 05:51 PM
RE: Multiprocessing on python - by jefsummers - Apr-01-2024, 07:19 PM
RE: Multiprocessing on python - by sawtooth500 - Apr-02-2024, 01:53 AM
RE: Multiprocessing on python - by sawtooth500 - Apr-02-2024, 03:07 AM
RE: Multiprocessing on python - by sawtooth500 - Apr-02-2024, 03:05 PM
RE: Multiprocessing on python - by deanhystad - Apr-02-2024, 04:14 PM
RE: Multiprocessing on python - by sawtooth500 - Apr-02-2024, 06:03 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to run existing python script parallel using multiprocessing lravikumarvsp 3 4,993 May-24-2018, 05:23 AM
Last Post: lravikumarvsp

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020