Python Forum
painfully slow runtime when handling data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
painfully slow runtime when handling data
#1
Hi everyone,

I am new to Python, and just completed some codes as below that take forever to run. It seemed working if I only run the first 10 rows in the main DataFrame, I hope someone can assist me in improving the codes, so that it can run much faster. Many thanks

from datetime import timedelta
import numpy as np
import pandas as pd

# Get data from the given excel file
filepath = r'C:\Users\User\Desktop\Power BI\All.xlsx'

df_CoC = pd.read_excel(filepath, usecols=['ID, 'Is there time loss?', 'MC Capacity Date From',
                                          'MC Capacity Date To'], sheet_name='CoC') #18000 rows
df_WRS = pd.read_excel(filepath, usecols=['ID', 'Date From', 'Date To'], sheet_name='WRS') #22000 rows
df_Open = pd.read_excel(filepath, sheet_name='Open')  #2500 rows
Count_Open = df_Open['ID'].count()
df_Open['CoC_MissingWRS'] = None
df_Open['WRS_MissingCoC'] = None

####### Fill new 'CoC_MissingWRS' column in open query ########
for i in range(0, Count_Open):
    if i > 90: break
    imn = df_Open.loc[i, 'Injury Management: Injury Management Number']

    #Fill WRS date array
    arrWRSDates = np.array([])
    filtered_df = df_WRS[df_WRS['ID'] == imn]
    if not filtered_df.empty:
        Count_WRS = filtered_df['ID'].count()-1
        for w in range(0, Count_WRS):
                #Date_From is always available
                Date_From = filtered_df.iloc[w]['Date From'].date()
                if filtered_df.iloc[w]['Date To'] is None: Date_To = Date_From
                else: Date_To = filtered_df.iloc[w]['Date To'].date()
                if Date_To < Date_From: Date_To = Date_From
                adate = Date_From
                while adate <= Date_To:
                    if adate not in arrWRSDates: arrWRSDates = np.append(arrWRSDates, adate)
                    adate += timedelta(days=1)

    # Fill Timeloss & all CoC date array
    arrTLCoCDates = np.array([])
    arrAllCoCDates = np.array([])
    filtered_df = df_CoC[df_CoC['ID'] == imn]
    if not filtered_df.empty:
        Count_CoC= filtered_df['ID'].count() - 1
        for c in range(0, Count_CoC):
                Date_From = filtered_df.iloc[c]['MC Capacity Date From'].date()
                if filtered_df.iloc[c]['MC Capacity Date To'] is None: Date_To = Date_From
                else: Date_To = filtered_df.iloc[c]['MC Capacity Date To'].date()
                if Date_To < Date_From or Date_To is None: Date_To = Date_From
                adate = Date_From
                while adate <= Date_To:
                    if df_CoC.iloc[c]['Is there time loss?'] == 'Yes':
                        if adate not in arrTLCoCDates: arrTLCoCDates = np.append(arrTLCoCDates, adate)
                    else:
                        if adate not in arrAllCoCDates: arrAllCoCDatesCoCDates = np.append(arrAllCoCDates, adate)
                    adate += timedelta(days=1)


    ##### Check if CoC missing WRS #####
    bMissingWRS = 'No'
    if len(arrTLCoCDates) > 0:
        adate = min(arrTLCoCDates)
        while adate <= max(arrTLCoCDates) and bMissingWRS == 'No':
            if adate not in arrWRSDates and adate.weekday() < 5: bMissingWRS = 'Yes'
    df_Open.loc[i, 'CoC_MissingWRS'] = bMissingWRS

    ##### Check if WRS missing CoC #####
    bMissingCoC = 'No'
    if len(arrWRSDates) > 0:
        adate = min(arrWRSDates)
        while adate <= max(arrWRSDates) and bMissingCoC == 'No':
            if adate not in arrAllCoCDates and adate.weekday() < 5: bMissingCoC = 'Yes'
    df_Open.loc[i, 'WRS_MissingCoC'] = bMissingCoC
Reply
#2
Where is it slow?
Suggest you put sections into cells in a Jupyter notebook and execute the code segments individually. That should narrow it down. Then we can more likely help
Reply
#3
Hi jefsummers,

Thank you for your advice, I tried to run the code in Power BI and PyCharm Community, I never got to see the result, unless I run it for the first 10 rows or something. I will look Jupyter notebook up, and will get back to you. Thank you again.
Reply
#4
The way you do it will be very slow,every time write a loop in Pandas it often the wrong approch.
The way Pands is build should try to use Vectorization it will be a lot faster,look at this blog post.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  class and runtime akbarza 4 369 Mar-16-2024, 01:32 PM
Last Post: deanhystad
  Big O runtime nested for loop and append yarinsh 4 1,392 Dec-31-2022, 11:50 PM
Last Post: stevendaprano
Star python exception handling handling .... with traceback mg24 3 1,268 Nov-09-2022, 07:29 PM
Last Post: Gribouillis
  Reducing runtime memory usage in Cpython interpreter david_the_graower 2 2,223 Oct-18-2021, 09:56 PM
Last Post: david_the_graower
  Object reference in Dict - not resolved at runtime benthomson 2 1,847 Apr-02-2020, 08:50 AM
Last Post: benthomson
  PyCharm asking for VC++ runtime 14.0 whereas I have already installed VC++ 19.0 SarmadiRizvi 1 1,823 Apr-02-2020, 06:17 AM
Last Post: snippsat
  EOFError: EOF when reading a line - Runtime Error RavCOder 6 9,664 Sep-27-2019, 12:22 PM
Last Post: RavCOder
  Different runtime programs mhvozdesky 1 43,790 Oct-24-2018, 02:50 PM
Last Post: ichabod801
  PyDev package install/download at runtime user2103 0 2,478 Jan-30-2018, 08:39 AM
Last Post: user2103
  Question on runtime and improving nested for loops ackmondual 1 3,052 Jun-13-2017, 11:11 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020