Python to Excel date to number fix | performance issues - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Python to Excel date to number fix | performance issues (/thread-9772.html) |
Python to Excel date to number fix | performance issues - Prashant - Apr-27-2018 I've to export the database table to Excel (xlsx) format. I came to know that Python is very fast in file processing. I've installed python 3.6.1, XlsxWriter-1.0.2 and cx_Oracle-5.3-12c But I saw the date field from the database was getting converted to the number. To resolve this, I made a check if a list value \ (cell) value is date time, if yes then formatted the value to date type which resolves my problem But, I am now running into performance issue with this extra check any suggestion how to gain performance Below python function is used to generate excel xlsx from the database cursor PS: Quick background of how I am proessing: I am opening a loop which is processing only 50,000 rows and then calls this python excel writer function: I've close to 900 such table and data counts in these ranges from 1,00,000 to 3,20,00,00,000 I am newbie to python and only think of executing the same python code for separate code parallel 10 times. Guidance would be much appreciated #Function to write excel from Oracle Cursor def writeToExcel(cur_sor, targetDir, export_file_name): Actual_Path = os.path.join(targetDir, export_file_name) #Array to capture Date type columns DateTimeColumns = [] print('\t\t\t writing: '+export_file_name+'\t\t\t\t'+str(datetime.datetime.now())) workbook = xlsxwriter.Workbook(Actual_Path) # Create Excel Object for new workbook worksheet = workbook.add_worksheet(sourceSYS) # Add a New Worksheet Name - scott_schema row = 0 col = 0 for i in range(len(cur_sor.description)): desc = cur_sor.description[i] #Only Data Type column will be capture if format(desc[1])== "<class 'cx_Oracle.TIMESTAMP'>": DateTimeColumns.append(i) bold = workbook.add_format({'bold': True}) date_format = workbook.add_format({'num_format': 'dd/mm/yy'}) worksheet.write(row, (col + i), format(desc[0]), bold) # Iterate for column headers date_format = workbook.add_format({'num_format': 'dd/mm/yy'}) color_format = workbook.add_format() color_format.set_font_color('red') row = row + 1 #Loop for each row, return by database cursor for tupple_row in cur_sor: col = 0 #Loop for each column, for particular row open in level-up cursor for list_item in tupple_row: #If column position matches with datetype column position if col in DateTimeColumns: #Check if the cell value is date type, additional check to handle "None" (blank date cell) if isinstance(list_item, datetime.date) or isinstance(list_item, datetime.datetime) \ or isinstance(list_item, datetime.time) or isinstance(list_item, datetime.timedelta): #Format the date, inorder to save as date rather number worksheet.write(row, col, list_item.strftime("%Y-%b-%d %H:%M:%S.%f")) else: worksheet.write(row, col, list_item) col = col + 1 row = row + 1 workbook.close() RE: Python to Excel date to number fix | performance issues - killerrex - Apr-28-2018 Hi, In excel the epochs are also stored as a float value. To be precise as days since 1/1/1900. So the number 36526 is equivalent to 01/01/2000. The trick I use when I have to perform this kind of conversions and I do not care about leap seconds and similar is to calculate the offset between both representations and store the number, avoiding converting to string. So if sql is storing the dates as days since 1/1/1970 (I don't know, but many systems do so) in the excel cell I store the value + 25569. If the Excel cell has a date format is enough to see it correctly. |