Python Forum
Reduce four for loops or parallelizing code in Python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reduce four for loops or parallelizing code in Python
#1
I have this code that I have been working and creating data based on my actual data. I am using pandas and Python. Here is how my code looks like:
new_df = pd.DataFrame(columns=['dates', 'Column_D', 'Column_A', 'VALUE', 'Column_B', 'Column_C'])
for i in df["dates"].unique():
    for j in df["Column_A"].unique():
        for k in df["Column_B"].unique():
              for m in df["Column_C"].unique():
                    n = df[(df["Column_D"] == 'orange') & (df["dates"] == '2005-1-1') & (df["Column_A"] == j) & (df["Column_B"] == k) & (df["Column_C"] == m)]['VALUE']
                    x = df[(df["dates"] == '2005-1-1') & (df["Column_A"] == j) & (df["Column_B"] == k) & (df["Column_C"] == m)]['VALUE'].sum()
                    tempVal = df[(df["dates"] == i)  & (df["Column_A"] == j) & (df["Column_B"] == k) & (df["Column_C"] == m)]['VALUE'].agg(sum)
                    finalVal = (n * tempVal) / (x - n)
                    if finalVal.empty | finalVal.isnull().values.any() | finalVal.isna().values.any() | np.inf(finalVal).values.any():
                       finalVal = 0
                    finalVal = int(finalVal)

                    new_df = new_df.append({'dates': i, 'Column_D': 'orange', 'Column_A': j, 'VALUE': finalVal, 'Column_B': k, 'Column_C': m}, ignore_index=True)
It takes a long time for my code to run right now and I'm not sure how to fix it and reduce the speed. I suspect the code is written sequentially. Could I get some help to reduce the speed? I want to know how to write my code in parallel and reduce the number of for loops. I heard pyspark is good, but will it help me? Thanks!
Reply
#2
print contents of i, j, k and m at the beginning of each loop.

This will show you how many times each sub-loop has to spin through, where all the time is spent.
keep in mind that each sub-loop has to spin as many times as instructed by it's parent, at each level, so by the time
you get to Column 3 the number of iterations is massive.

This must be avoided.

In order to take a stab at correcting this. you will need to provide a copy of your df at start and explain the purpose of new_df.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  reduce time series based on sum condition amdi40 0 1,385 Apr-06-2022, 09:09 AM
Last Post: amdi40
  Parallelizing Run ARIMA Model wissam1974 0 3,344 Mar-02-2019, 07:20 PM
Last Post: wissam1974
  how to reduce running time of the code dilmailid 6 4,649 May-18-2018, 02:49 AM
Last Post: scidam
  loops in python nuncio 2 3,930 Sep-21-2017, 10:52 AM
Last Post: Sagar
  Reduce code run time shaynehansen 2 3,731 Jul-07-2017, 09:54 PM
Last Post: shaynehansen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020