Python Forum

Full Version: Python to Excel date to number fix | performance issues
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I've to export the database table to Excel (xlsx) format.

I came to know that Python is very fast in file processing.

I've installed python 3.6.1, XlsxWriter-1.0.2 and cx_Oracle-5.3-12c

But I saw the date field from the database was getting converted to the number. To resolve this, I made a check if a list value \ (cell) value is date time, if yes then formatted the value to date type which resolves my problem

But, I am now running into performance issue with this extra check any suggestion how to gain performance

Below python function is used to generate excel xlsx from the database cursor

PS: Quick background of how I am proessing:
I am opening a loop which is processing only 50,000 rows and then calls this python excel writer function:
I've close to 900 such table and data counts in these ranges from 1,00,000 to 3,20,00,00,000
I am newbie to python and only think of executing the same python code for separate code parallel 10 times.
Guidance would be much appreciated

#Function to write excel from Oracle Cursor
def writeToExcel(cur_sor, targetDir, export_file_name):
    Actual_Path = os.path.join(targetDir, export_file_name)

    #Array to capture Date type columns
    DateTimeColumns = []

    print('\t\t\t  writing: '+export_file_name+'\t\t\t\t'+str(datetime.datetime.now()))
    workbook = xlsxwriter.Workbook(Actual_Path)  # Create Excel Object for new workbook
    worksheet = workbook.add_worksheet(sourceSYS)  # Add a New Worksheet Name - scott_schema
    row = 0
    col = 0

    for i in range(len(cur_sor.description)):
        desc = cur_sor.description[i]

        #Only Data Type column will be capture
        if format(desc[1])== "<class 'cx_Oracle.TIMESTAMP'>":
            DateTimeColumns.append(i)

        bold = workbook.add_format({'bold': True})
        date_format = workbook.add_format({'num_format': 'dd/mm/yy'})
        worksheet.write(row, (col + i), format(desc[0]), bold)  # Iterate for column headers

    date_format = workbook.add_format({'num_format': 'dd/mm/yy'})
    color_format = workbook.add_format()
    color_format.set_font_color('red')
    row = row + 1

    #Loop for each row, return by database cursor
    for tupple_row in cur_sor:
        col = 0
        #Loop for each column, for particular row open in level-up cursor
        for list_item in tupple_row:

            #If column position matches with datetype column position
            if col in DateTimeColumns:
                #Check if the cell value is date type, additional check to handle "None" (blank date cell)
                if isinstance(list_item, datetime.date) or isinstance(list_item, datetime.datetime) \
                    or isinstance(list_item, datetime.time) or isinstance(list_item, datetime.timedelta):
                    #Format the date, inorder to save as date rather number
                    worksheet.write(row, col, list_item.strftime("%Y-%b-%d %H:%M:%S.%f"))
            else:
                worksheet.write(row, col, list_item)

            col = col + 1
        row = row + 1

    workbook.close()
Hi,

In excel the epochs are also stored as a float value. To be precise as days since 1/1/1900. So the number 36526 is equivalent to 01/01/2000.

The trick I use when I have to perform this kind of conversions and I do not care about leap seconds and similar is to calculate the offset between both representations and store the number, avoiding converting to string.

So if sql is storing the dates as days since 1/1/1970 (I don't know, but many systems do so) in the excel cell I store the value + 25569. If the Excel cell has a date format is enough to see it correctly.