.dat file to xlsxl

Makada · Apr-24-2020, 10:43 AM

Hi buran,

Thank you so much for the effort, much appreciated Smile

I will have a look if i understand the code and what it is doing.
I will let you know.

Thanks alot.

**buran** · Apr-24-2020, 10:56 AM

note that instead of hard-coded values for start_row and last_update in the beginning it can read the xlsx file once and read these values from there

Makada · (This post was last modified: Apr-24-2020, 12:31 PM by Makada.)

Hi buran,

Is it possible the "Startrow" and "Lastupdate" can be removed and replaced by code where python automatically see where to start and from where to update of it missed data sinds last update?

The script isnt running now and then so i have to search in the data from where to update.

With kind regards.

**buran** · Apr-24-2020, 12:42 PM

As I said - yes, that was my last comment. Would you try to code it yourself? You know how to read xslx file into dataframe, find how many rows, find last row, extract last update... Examples are in my code

Makada · Apr-24-2020, 03:43 PM

Hi buran,

I forgot i cant open the output excel file while its updating from python.
The python script gives error when the output file is open.
Do you know if its possible to have the Excel output file open while its updating?
So i can see every minutes update in the output file?

With kind regards.

**buran** · (This post was last modified: Apr-24-2020, 04:00 PM by buran.)

no. I don't think you can write to excel if it is open. python will error. So your excel is not only for archive, but for viewing.
you can write to a database (or even a csv file), link from excel, refresh excel query (e.g. Power Query) regularly. Also, noote that probably you will be able to link excel directly to dat file. What period of data do you really need to view? Probably you can separate viewing and archive part of the process.
Note that it looks like your workflow has too many intermediate steps, probably not all necessary.
Where does dat file comes from and being updated every minute? Can you access the source directly (from python/excel) or make it to write to db/ not dat file?

Makada · (This post was last modified: Apr-24-2020, 05:11 PM by Makada.)

By the way, i see when the code is running and in between an update i open the Excel output file, theres no data in it...while its showing in the screen below its writing records.
When i restart the script and open the outputfile right after the first update, then theres data in it.
After the second update its empty.

This is the first update:

Second update, outputfile is empty:

**buran** · (This post was last modified: Apr-24-2020, 05:16 PM by buran.)

Actually, it turns that it overwrite the existing data. Scroll down, there should be data around latest start row.

I didn't see this coming. You can read the existing data once, and then append new dataframes and write everything over again. (i.e. this will avoid reading from excel ever ytime, but you will need to write everything.

Maybe you really need to consider different storage.

Makada · Apr-24-2020, 05:25 PM

Mm, so i cant get it to work with Excel outputfile.
While it works ok with a .dat file:

import time
import schedule    
     
def task1():  
    existingLines = set(line.strip() for line in open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat"))
    outfile = open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat", "a+")
    for content in open("C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat", "r"):
        if content.strip() not in existingLines: # to void duplicate lines
            outfile.write(content)
            existingLines.add(content)
    outfile.close()
     
schedule.every().minute.at(":01").do(task1)  
 
while True:
    schedule.run_pending()
    time.sleep(1)
refresh()

I really like the possibilities of Excel for simple data correction and graphing.

**buran** · (This post was last modified: Apr-24-2020, 05:56 PM by buran.)

you are closing the dat file after you write to it. If you open it another program like Notepad while trying to write to it I expect it to error. If you open it in Notepad++, you can write to it, but it will not refresh unless you switch to different tab and then back again to original tab. As far as I know it depends how the program opens the file.

Otherwise this will resolve the overwriting

import pandas as pd
from datetime import datetime
import time


def read_data(dat_file, last_update):
    df = pd.read_table(dat_file, delimiter=',', parse_dates=[1,], low_memory=False, skiprows=1)
    df.drop("RECNBR", axis=1, inplace=True)
    df["TMSTAMP"] = pd.to_datetime(df["TMSTAMP"])
    cols = df.columns.drop(labels=["TMSTAMP"])
    df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
    return df[df['TMSTAMP'] > last_update]

def write_xlsx(xlsx_file, df):
    df.to_excel(xlsx_file, index=False)


dat_file = 'foo.dat'
xlsx_file = 'foo.xlsx'
try:
    df = pd.read_excel(xlsx_file, engine='openpyxl', parse_dates=[1,])
    last_update = df.TMSTAMP.iat[-1]
except FileNotFoundError:
    df = pd.DataFrame(columns=["TMSTAMP", "WS_kph_S_WVT","WindDir_D1_WVT","WindDir_SD1_WVT","WS_kph_Max",
                               "AirTC_Avg","AirTC_Max","AirTC_Min","RH_Avg","RH_Max","RH_Min","BP_mbar_Avg","BP_mbar_Max",
                               "BP_mbar_Min","Rain_mm_Tot","Rain_mm_Intensity_1_min","Rain_mm_Max_Intensity_1_min",
                               "Rain_mm_Min_Intensity_1_min","Tot_Rain_mm_12_sec_Max","Tot_Rain_mm_12_sec_Avg",
                               "Lufft_R2S_Mode","Mode(1)","Mode(2)","Mode(3)","Mode(4)","Mode(5)","Mode(6)","SlrW_Avg",
                               "SlrW_Max","TdC_Avg","TdC_Max","TdC_Min","TwC_Avg","TwC_Max","TwC_Min","HI_C_Avg","HI_C_Max",
                               "SVPWPa_Avg","SVPWPa_Max","SVPWPa_Min","SunHrs_Tot","PrecipitationHrs_Tot","PotSlrW_Avg",
                               "PotSlrW_Max","PotSlrW_Min","WC_C_Avg","WC_C_Min"])
    last_update = datetime.strptime('2020-01-01 21:11:00', '%Y-%m-%d %H:%M:%S') # change to some date in the past

while True:
    new_data = read_data(dat_file=dat_file, last_update=last_update)
    num_rows = new_data.shape[0] # get number of rows in dataframe
    if num_rows:
        print(f'Appending {num_rows} records to {xlsx_file}')
        df = df.append(new_data, ignore_index=True)
        print(df)
        write_xlsx(xlsx_file=xlsx_file, df=df)
        last_update = new_data.TMSTAMP.iat[-1]
    else:
        print('No new data id dat file.')
    time.sleep(60) # wait 1 minute

Of course if it runs long enough it will hit Memory limit

.dat file to xlsxl

User Panel Messages

Announcements