How can I speed up my openpyxl program reading Excel .xlsx files?

How can I speed up my openpyxl program reading Excel .xlsx files? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How can I speed up my openpyxl program reading Excel .xlsx files? (/thread-26521.html)

How can I speed up my openpyxl program reading Excel .xlsx files? - deac33 - May-04-2020

Like many folks I need to read both .xls files (I call them S files, using xlrd) and .xlsx files (the X files, using openpyxl), in both cases files of about 30,000 rows. And in both cases I'm just copying all excel data read out to a .csv file, no other processing so just Input/Output.

But the X file operations are over 200 times slower than for .xls, for example reading a 30,000 row .xlsx file now takes 2 minutes compared to 1/2 second for .xls with xlrd. We have thousands of files to process so the time per file matters.

Is openpyxl that much slower or do I need to do something, like release some resource at the end of each row?

BTW, I have made several great improvements by using read_only=True and reading a row at a time instead of cell by cell
as shown in the following code segment. Thanks to blog.davep.org
https://blog.davep.org/2018/06/02/a_little_speed_issue_with_openpyxl.html

	wb = openpyxl.load_workbook("excel_file.xlsx",  data_only=True,  read_only=True)
	sheet = wb.active
	for row in sheet.rows:
	    for cell in row:
	        cell_from_excel = cell.value