How can I speed up my openpyxl program reading Excel .xlsx files? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How can I speed up my openpyxl program reading Excel .xlsx files? (/thread-26521.html) |
How can I speed up my openpyxl program reading Excel .xlsx files? - deac33 - May-04-2020 Like many folks I need to read both .xls files (I call them S files, using xlrd) and .xlsx files (the X files, using openpyxl), in both cases files of about 30,000 rows. And in both cases I'm just copying all excel data read out to a .csv file, no other processing so just Input/Output. But the X file operations are over 200 times slower than for .xls, for example reading a 30,000 row .xlsx file now takes 2 minutes compared to 1/2 second for .xls with xlrd. We have thousands of files to process so the time per file matters. Is openpyxl that much slower or do I need to do something, like release some resource at the end of each row? BTW, I have made several great improvements by using read_only=True and reading a row at a time instead of cell by cell as shown in the following code segment. Thanks to blog.davep.org https://blog.davep.org/2018/06/02/a_little_speed_issue_with_openpyxl.html wb = openpyxl.load_workbook("excel_file.xlsx", data_only=True, read_only=True) sheet = wb.active for row in sheet.rows: for cell in row: cell_from_excel = cell.value |