Python Forum - Code taking too much time to process

Hi team,

I am a Newbie using python, i was trying to interpretate several columns from an excel file using openpyxl, and generate (write) on a file (XML), the code is working, but is taking too much time (around 20 mins Wall

). The code basically reads an excel file, selects a tab, reads the information by using for and while loops.

My decisions are being done using a lot of If/elif with a lot of "ORs" conditions, Is there any other way to make the processing faster / more efficient? Confused

if (ParameterName == 'a1MeashoEnabled'or ParameterName == 'a5MeasEnabled' or ParameterName == 'a3MeasEnabled' or ParameterName == 'a2MeasRedirectEnabled' or ParameterName == 'a2MeashoEnabled') and ParameterValue is not None:
ParameterValue = str(ParameterValue)
if ParameterValue.isdigit() is True:
if int(ParameterValue)==0:
ParameterValue = 'none'
elif int(ParameterValue)==1:
ParameterValue ='rsrp'
elif int(ParameterValue)==2:
ParameterValue ='rsrq'
elif int(ParameterValue)==3:
ParameterValue ='rsrpAndRsrq'
elif int(ParameterValue)==4:
ParameterValue ='rsrpCombined'
elif int(ParameterValue)==5:
ParameterValue ='rsrqCombined'
else:
pass
else:
pass

It would be better to see all the code; I doubt the comparisons are themselves are causing the problem.

(Nov-05-2020, 06:30 PM)ndc85430 Wrote: [ -> ]It would be better to see all the code; I doubt the comparisons are themselves are causing the problem.

Hi ndc, the code is very large, this is the reason i try to explain using the criteria that i did for pake the "parser"

I don't really know what you mean by that, but those comparisons all look to be trivial, so again, they likely aren't going to account for the slowness. You should run a profiler (e.g. cProfile, for which the docs are here) on the code to see where the most time is being spent. That will help people figure out what optimisations can be made.

Agree with @ndc85430. It looks like you are reading through the xls file, which could be sped up by reading it into a pandas dataframe and then manipulating from there. Pandas is pretty fast and allows vector operations

(Nov-05-2020, 07:59 PM)ErPipex Wrote: [ -> ]
(Nov-05-2020, 06:30 PM)ndc85430 Wrote: [ -> ]It would be better to see all the code; I doubt the comparisons are themselves are causing the problem.

Hi ndc, the code is very large, this is the reason i try to explain using the criteria that i did for pake the "parser"

Thank u!

(Nov-07-2020, 06:25 PM)ndc85430 Wrote: [ -> ]I don't really know what you mean by that, but those comparisons all look to be trivial, so again, they likely aren't going to account for the slowness. You should run a profiler (e.g. cProfile, for which the docs are here) on the code to see where the most time is being spent. That will help people figure out what optimisations can be made.

I will try make a second version using, Pandas with my first version i only use openpyxl, i was not expecting that take too much, but almost allinformation is correlated or indexed, I even thought about using Pandas but I bet on "simplification"

I will try make a second version for this next weeks using pandas, and see if there is some improvement

You can write it shorter:

allowed_parameters = ('a1MeashoEnabled', 'a5MeasEnabled', 'a3MeasEnabled', 'a2MeasRedirectEnabled', 'a2MeashoEnabled')
# there is no None in allowed_parameters, so you don't have to check for None

parameter_values = ("none", "rsrp", "rsrq", 'rsrpAndRsrq', 'rsrpCombined', 'rsrqCombined')
# used to get the index


def get_value_from_parameter(parameter_name, parameter_value):
    if parameter_name in allowed_parameters:
        if parameter_value in parameter_values:
            return parameter_values.index(parameter_value)




ParameterName = "a3MeasEnabled"
ParameterValue = "rsrpCombined"

print(get_value_from_parameter(ParameterName, ParameterValue))

This won't speed up your code, but it makes it more readable.

You can even make the branching in one line like you did before with the check of NoneType.

def get_value_from_parameter(parameter_name, parameter_value):
    if parameter_name in allowed_parameters and parameter_value in parameter_values:
        return parameter_values.index(parameter_value)

The function can access objects on module level.
The definition of allowed_parameters and parameter_values was on module level.
The definition of this two objects could be done inside the function, but then no other function can access this information.

Notice to me: Constants UPPER_CASE.

(Nov-09-2020, 02:03 PM)DeaD_EyE Wrote: [ -> ]You can write it shorter:
allowed_parameters = ('a1MeashoEnabled', 'a5MeasEnabled', 'a3MeasEnabled', 'a2MeasRedirectEnabled', 'a2MeashoEnabled')
# there is no None in allowed_parameters, so you don't have to check for None

parameter_values = ("none", "rsrp", "rsrq", 'rsrpAndRsrq', 'rsrpCombined', 'rsrqCombined')
# used to get the index


def get_value_from_parameter(parameter_name, parameter_value):
    if parameter_name in allowed_parameters:
        if parameter_value in parameter_values:
            return parameter_values.index(parameter_value)




ParameterName = "a3MeasEnabled"
ParameterValue = "rsrpCombined"

print(get_value_from_parameter(ParameterName, ParameterValue))
This won't speed up your code, but it makes it more readable.

You can even make the branching in one line like you did before with the check of NoneType.
def get_value_from_parameter(parameter_name, parameter_value):
    if parameter_name in allowed_parameters and parameter_value in parameter_values:
        return parameter_values.index(parameter_value)
The function can access objects on module level.
The definition of allowed_parameters and parameter_values was on module level.
The definition of this two objects could be done inside the function, but then no other function can access this information.

Notice to me: Constants UPPER_CASE.

Hi thank u for this feedback, i am limited for the ParameterValue because the final interpretor, is Case-sensitive, for some internal issues it use for example values as: false;False/FALSE - true/True/TRUE.

I dont know if one of the problems is the way im using the openpyxl

MY CODE:
ref_workbook=openpyxl.load_workbook(path_workbook)

I saw in the: stackoverflow a case (https://stackoverflow.com/questions/3582...ed-to-xlrd) where was talking about using:

wb = openpyxl.load_workbook(file_name, read_only=True)

I dont know if there is a big difference about using or not that notification

The notice to myself was to write CONSTANS in Code in upper case (PEP8). You weren't meant.

That you can't change the content in your Excel-File should be clear.
I guess its data you get from somewhere and you've no influence of the fields/naming.
This is in 99.999% the case. + missing and wrong Data + bugs in cell datatype format

If you work with openpyxl it should do automatic conversions to Python types.

But cells with explicit text format, which have strings like "false", "NONE", "none", "true" etc. are not converted into Boolean. You get this "Keywords" as str. I think this is what you want and need.