![]() |
my openpyxl use is too slow, am I reading rows incorrectly? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: my openpyxl use is too slow, am I reading rows incorrectly? (/thread-19087.html) |
my openpyxl use is too slow, am I reading rows incorrectly? - Clunk_Head - Jun-12-2019 I'm using openpyxl to read 500,000 to 900,000 record excel files with 100 columns, give or take. I have a function that I'm using to read a row: def read_row(worksheet, row, cols): row_data = [] for index in range(1, cols + 1): row_data.append(worksheet.cell(row, column = index).value) return row_databut it takes between .2 and .8 seconds to read and return each row. This average of .5 seconds times 900,000 translates to 5 days processing time for a single pull. Is there any way to speed up this function or to use a faster module for excel? For this I'm not married to any concept except using python to read excel so I'm open to any constructive advice. Thank you RE: my openpyxl use is too slow, am I reading rows incorrectly? - Clunk_Head - Jun-12-2019 Answered my own question: def read_row(worksheet, row): return [cell.value for cell in worksheet[row]]This reduced read_row time to a hair over 0.007 seconds and increased performance by a factor of 70. 5 days has been reduced to less than 2 hours but I hope there's a way to cut this down even more. Still leaving this open for any further advice. RE: my openpyxl use is too slow, am I reading rows incorrectly? - deac33 - Apr-30-2020 Huge help for me, makes my program usable now. Makes sense to read the whole row instead of a cell at a time as I had done. ![]() I guess you didn't get any further help? Thanks mucho, -deac33 |