.dat file to xlsxl

**buran** · Apr-23-2020, 08:58 AM

pass index=False when calling df.to_excel()

Makada · (This post was last modified: Apr-23-2020, 09:12 AM by Makada.)

Yes, thats working Smile

Thanks

Makada · (This post was last modified: Apr-23-2020, 04:37 PM by Makada.)

Hi,

It takes some time to load the data every time.
Is it possible to only add new data to the .xlsxl output file every minute, so it wont have to load data already in the output file?
And when the script hasnt run for some time, the output file is automatically updated with the new data beginning from where the script has stopped?

The source file is at: "C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat"
The output file is at: "C:\\Users\\Makada\\Desktop\\Map1.xlsx"

With kind regards.

**buran** · (This post was last modified: Apr-23-2020, 04:39 PM by buran.)

Actually we don't know what you are doing. Why do you read everything every time? Do you need the data?
When writing to excel file You can specify the start row by supplying startrow. Of course you should know or find the number of rows already in the file
Please, check the documentation for pandas.to_excel

**buran** · Apr-23-2020, 04:46 PM

Also, if the excel file serves as some sort of database consider storing data to real database. even simple sqlite (lightweight disk-based database) would be a good choice.

Makada · Apr-23-2020, 04:54 PM

Hi,

Yes i need all the data for archive.
So its not needed to load all data every one minute update, but add only new data on top of all previous data. So it will load much faster.
I know how to use it shown in the code below where the output file has unlimited data, while the source can only have about 30 days of data.

But i dont know how to use it when the outputfile is .xlsx and has some changes you suggested and worked.

import time
import schedule    
    
def task1():  
    existingLines = set(line.strip() for line in open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat"))
    outfile = open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat", "a+")
    for content in open("C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat", "r"):
        if content.strip() not in existingLines: # to void duplicate lines
            outfile.write(content)
            existingLines.add(content)
    outfile.close()
    
schedule.every().minute.at(":01").do(task1)  

while True:
    schedule.run_pending()
    time.sleep(1)
refresh()

**buran** · Apr-23-2020, 05:06 PM

I would return from task1 the last date/time/some-other-identifying-field in the records in the output file and number of total records.
Then on the next call, I will pass them again as arguments (line increased by one) and write from there.
and if it has 30 days of records why run every 1 minute (if it is just an archive)?
and again - database would be better choice than excel file.

Makada · Apr-23-2020, 06:11 PM

Hi buran,

Sorry i lost you here:(
Yes i know about sql possibilities, but id like to have it in Excel.
And from there save daily, monthly, yearly files.

With kind regards.

Makada · Apr-24-2020, 08:03 AM

Hi buran,

Do you have an example from what you last posted?

With kind regards.

**buran** · (This post was last modified: Apr-24-2020, 10:27 AM by buran.)

I bit different from what I initially suggested but in my opinion more robust solution

import pandas as pd
from datetime import datetime
import time


def read_data(dat_file, last_update):
    df = pd.read_table(dat_file, delimiter=',', parse_dates=[1,], low_memory=False, skiprows=1)
    df.drop("RECNBR", axis=1, inplace=True)
    df["TMSTAMP"] = pd.to_datetime(df["TMSTAMP"])
    cols = df.columns.drop(labels=["TMSTAMP"])
    df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
    return df[df['TMSTAMP'] > last_update]

def write_xlsx(xlsx_file, start_row, df):
    df.to_excel(xlsx_file, startrow=start_row, index=False)


dat_file = 'foo.dat'
xlsx_file = 'foo.xlsx'
start_row = 0 # change this to actual start_row when run for first time, e.g. if you have 30 recors in the xlsx file this will be 30 (first row is at index 0)
last_update = datetime.strptime('2020-03-25 21:11:00', '%Y-%m-%d %H:%M:%S') # change to last dateime already in xlsx file
while True:
    df = read_data(dat_file=dat_file, last_update=last_update)
    num_rows = df.shape[0] # get number of rows in dataframe
    if num_rows:
        print(f'Writing {num_rows} records to {xlsx_file}')
        write_xlsx(xlsx_file=xlsx_file, start_row=start_row, df=df)
        start_row += num_rows 
        last_update = df.TMSTAMP.iat[-1]
        print(f'Last update in xlsx file is {last_update}, start_row is {start_row}')
    else:
        print('No new data id dat file.')
    time.sleep(60) # wait 1 minute

.dat file to xlsxl

User Panel Messages

Announcements