Python Forum
my openpyxl use is too slow, am I reading rows incorrectly? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: my openpyxl use is too slow, am I reading rows incorrectly? (/thread-19087.html)



my openpyxl use is too slow, am I reading rows incorrectly? - Clunk_Head - Jun-12-2019

I'm using openpyxl to read 500,000 to 900,000 record excel files with 100 columns, give or take.
I have a function that I'm using to read a row:
def read_row(worksheet, row, cols):
    row_data = []
    for index in range(1, cols + 1): 
        row_data.append(worksheet.cell(row, column = index).value)
    return row_data
but it takes between .2 and .8 seconds to read and return each row.
This average of .5 seconds times 900,000 translates to 5 days processing time for a single pull.
Is there any way to speed up this function or to use a faster module for excel?

For this I'm not married to any concept except using python to read excel so I'm open to any constructive advice.

Thank you


RE: my openpyxl use is too slow, am I reading rows incorrectly? - Clunk_Head - Jun-12-2019

Answered my own question:
def read_row(worksheet, row):
    return [cell.value for cell in worksheet[row]]
This reduced read_row time to a hair over 0.007 seconds and increased performance by a factor of 70.
5 days has been reduced to less than 2 hours but I hope there's a way to cut this down even more.
Still leaving this open for any further advice.


RE: my openpyxl use is too slow, am I reading rows incorrectly? - deac33 - Apr-30-2020

Huge help for me, makes my program usable now. Makes sense to read the whole row instead of a cell at a time as I had done. Smile

I guess you didn't get any further help?
Thanks mucho, -deac33