Python Forum

I'm using openpyxl to read 500,000 to 900,000 record excel files with 100 columns, give or take.
I have a function that I'm using to read a row:

def read_row(worksheet, row, cols):
    row_data = []
    for index in range(1, cols + 1): 
        row_data.append(worksheet.cell(row, column = index).value)
    return row_data

but it takes between .2 and .8 seconds to read and return each row.
This average of .5 seconds times 900,000 translates to 5 days processing time for a single pull.
Is there any way to speed up this function or to use a faster module for excel?

For this I'm not married to any concept except using python to read excel so I'm open to any constructive advice.

Thank you

Answered my own question:

def read_row(worksheet, row):
    return [cell.value for cell in worksheet[row]]

This reduced read_row time to a hair over 0.007 seconds and increased performance by a factor of 70.
5 days has been reduced to less than 2 hours but I hope there's a way to cut this down even more.
Still leaving this open for any further advice.

Huge help for me, makes my program usable now. Makes sense to read the whole row instead of a cell at a time as I had done. Smile

I guess you didn't get any further help?
Thanks mucho, -deac33

Clunk_Head

Clunk_Head

deac33