Python Forum
Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
.dat file to xlsxl
#21
Hi buran,

Thank you so much for the effort, much appreciated Smile
I will have a look if i understand the code and what it is doing.
I will let you know.

Thanks alot.
Reply
#22
note that instead of hard-coded values for start_row and last_update in the beginning it can read the xlsx file once and read these values from there
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#23
Hi buran,

Is it possible the "Startrow" and "Lastupdate" can be removed and replaced by code where python automatically see where to start and from where to update of it missed data sinds last update?

The script isnt running now and then so i have to search in the data from where to update.


With kind regards.
Reply
#24
As I said - yes, that was my last comment. Would you try to code it yourself? You know how to read xslx file into dataframe, find how many rows, find last row, extract last update... Examples are in my code
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#25
Hi buran,

I forgot i cant open the output excel file while its updating from python.
The python script gives error when the output file is open.
Do you know if its possible to have the Excel output file open while its updating?
So i can see every minutes update in the output file?

With kind regards.
Reply
#26
no. I don't think you can write to excel if it is open. python will error. So your excel is not only for archive, but for viewing.
you can write to a database (or even a csv file), link from excel, refresh excel query (e.g. Power Query) regularly. Also, noote that probably you will be able to link excel directly to dat file. What period of data do you really need to view? Probably you can separate viewing and archive part of the process.
Note that it looks like your workflow has too many intermediate steps, probably not all necessary.
Where does dat file comes from and being updated every minute? Can you access the source directly (from python/excel) or make it to write to db/ not dat file?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#27
By the way, i see when the code is running and in between an update i open the Excel output file, theres no data in it...while its showing in the screen below its writing records.
When i restart the script and open the outputfile right after the first update, then theres data in it.
After the second update its empty.
[Image: Screenshot-20200424-184724.jpg]

This is the first update:
[Image: 20200424-190739.png]
[Image: Screenshot-20200424-190538.png]

Second update, outputfile is empty:
[Image: 20200424-190844.png]
[Image: Screenshot-20200424-190625.png]
Reply
#28
Actually, it turns that it overwrite the existing data. Scroll down, there should be data around latest start row.

I didn't see this coming. You can read the existing data once, and then append new dataframes and write everything over again. (i.e. this will avoid reading from excel ever ytime, but you will need to write everything.

Maybe you really need to consider different storage.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#29
Mm, so i cant get it to work with Excel outputfile.
While it works ok with a .dat file:
import time
import schedule    
     
def task1():  
    existingLines = set(line.strip() for line in open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat"))
    outfile = open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat", "a+")
    for content in open("C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat", "r"):
        if content.strip() not in existingLines: # to void duplicate lines
            outfile.write(content)
            existingLines.add(content)
    outfile.close()
     
schedule.every().minute.at(":01").do(task1)  
 
while True:
    schedule.run_pending()
    time.sleep(1)
refresh()
I really like the possibilities of Excel for simple data correction and graphing.
Reply
#30
you are closing the dat file after you write to it. If you open it another program like Notepad while trying to write to it I expect it to error. If you open it in Notepad++, you can write to it, but it will not refresh unless you switch to different tab and then back again to original tab. As far as I know it depends how the program opens the file.
Otherwise this will resolve the overwriting
import pandas as pd
from datetime import datetime
import time


def read_data(dat_file, last_update):
    df = pd.read_table(dat_file, delimiter=',', parse_dates=[1,], low_memory=False, skiprows=1)
    df.drop("RECNBR", axis=1, inplace=True)
    df["TMSTAMP"] = pd.to_datetime(df["TMSTAMP"])
    cols = df.columns.drop(labels=["TMSTAMP"])
    df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
    return df[df['TMSTAMP'] > last_update]

def write_xlsx(xlsx_file, df):
    df.to_excel(xlsx_file, index=False)


dat_file = 'foo.dat'
xlsx_file = 'foo.xlsx'
try:
    df = pd.read_excel(xlsx_file, engine='openpyxl', parse_dates=[1,])
    last_update = df.TMSTAMP.iat[-1]
except FileNotFoundError:
    df = pd.DataFrame(columns=["TMSTAMP", "WS_kph_S_WVT","WindDir_D1_WVT","WindDir_SD1_WVT","WS_kph_Max",
                               "AirTC_Avg","AirTC_Max","AirTC_Min","RH_Avg","RH_Max","RH_Min","BP_mbar_Avg","BP_mbar_Max",
                               "BP_mbar_Min","Rain_mm_Tot","Rain_mm_Intensity_1_min","Rain_mm_Max_Intensity_1_min",
                               "Rain_mm_Min_Intensity_1_min","Tot_Rain_mm_12_sec_Max","Tot_Rain_mm_12_sec_Avg",
                               "Lufft_R2S_Mode","Mode(1)","Mode(2)","Mode(3)","Mode(4)","Mode(5)","Mode(6)","SlrW_Avg",
                               "SlrW_Max","TdC_Avg","TdC_Max","TdC_Min","TwC_Avg","TwC_Max","TwC_Min","HI_C_Avg","HI_C_Max",
                               "SVPWPa_Avg","SVPWPa_Max","SVPWPa_Min","SunHrs_Tot","PrecipitationHrs_Tot","PotSlrW_Avg",
                               "PotSlrW_Max","PotSlrW_Min","WC_C_Avg","WC_C_Min"])
    last_update = datetime.strptime('2020-01-01 21:11:00', '%Y-%m-%d %H:%M:%S') # change to some date in the past

while True:
    new_data = read_data(dat_file=dat_file, last_update=last_update)
    num_rows = new_data.shape[0] # get number of rows in dataframe
    if num_rows:
        print(f'Appending {num_rows} records to {xlsx_file}')
        df = df.append(new_data, ignore_index=True)
        print(df)
        write_xlsx(xlsx_file=xlsx_file, df=df)
        last_update = new_data.TMSTAMP.iat[-1]
    else:
        print('No new data id dat file.')
    time.sleep(60) # wait 1 minute
Of course if it runs long enough it will hit Memory limit
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020