Posts: 8,168
Threads: 160
Joined: Sep 2016
pass index=False when calling df.to_excel()
Posts: 115
Threads: 10
Joined: Nov 2019
Apr-23-2020, 09:12 AM
(This post was last modified: Apr-23-2020, 09:12 AM by Makada.)
Yes, thats working
Thanks
Posts: 115
Threads: 10
Joined: Nov 2019
Apr-23-2020, 04:37 PM
(This post was last modified: Apr-23-2020, 04:37 PM by Makada.)
Hi,
It takes some time to load the data every time.
Is it possible to only add new data to the .xlsxl output file every minute, so it wont have to load data already in the output file?
And when the script hasnt run for some time, the output file is automatically updated with the new data beginning from where the script has stopped?
The source file is at: "C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat"
The output file is at: "C:\\Users\\Makada\\Desktop\\Map1.xlsx"
With kind regards.
Posts: 8,168
Threads: 160
Joined: Sep 2016
Apr-23-2020, 04:39 PM
(This post was last modified: Apr-23-2020, 04:39 PM by buran.)
Actually we don't know what you are doing. Why do you read everything every time? Do you need the data?
When writing to excel file You can specify the start row by supplying startrow . Of course you should know or find the number of rows already in the file
Please, check the documentation for pandas.to_excel
Posts: 8,168
Threads: 160
Joined: Sep 2016
Also, if the excel file serves as some sort of database consider storing data to real database. even simple sqlite (lightweight disk-based database) would be a good choice.
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
Yes i need all the data for archive.
So its not needed to load all data every one minute update, but add only new data on top of all previous data. So it will load much faster.
I know how to use it shown in the code below where the output file has unlimited data, while the source can only have about 30 days of data.
But i dont know how to use it when the outputfile is .xlsx and has some changes you suggested and worked.
import time
import schedule
def task1():
existingLines = set(line.strip() for line in open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat"))
outfile = open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat", "a+")
for content in open("C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat", "r"):
if content.strip() not in existingLines: # to void duplicate lines
outfile.write(content)
existingLines.add(content)
outfile.close()
schedule.every().minute.at(":01").do(task1)
while True:
schedule.run_pending()
time.sleep(1)
refresh()
Posts: 8,168
Threads: 160
Joined: Sep 2016
I would return from task1 the last date/time/some-other-identifying-field in the records in the output file and number of total records.
Then on the next call, I will pass them again as arguments (line increased by one) and write from there.
and if it has 30 days of records why run every 1 minute (if it is just an archive)?
and again - database would be better choice than excel file.
Posts: 115
Threads: 10
Joined: Nov 2019
Hi buran,
Sorry i lost you here:(
Yes i know about sql possibilities, but id like to have it in Excel.
And from there save daily, monthly, yearly files.
With kind regards.
Posts: 115
Threads: 10
Joined: Nov 2019
Hi buran,
Do you have an example from what you last posted?
With kind regards.
Posts: 8,168
Threads: 160
Joined: Sep 2016
Apr-24-2020, 10:27 AM
(This post was last modified: Apr-24-2020, 10:27 AM by buran.)
I bit different from what I initially suggested but in my opinion more robust solution
import pandas as pd
from datetime import datetime
import time
def read_data(dat_file, last_update):
df = pd.read_table(dat_file, delimiter=',', parse_dates=[1,], low_memory=False, skiprows=1)
df.drop("RECNBR", axis=1, inplace=True)
df["TMSTAMP"] = pd.to_datetime(df["TMSTAMP"])
cols = df.columns.drop(labels=["TMSTAMP"])
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
return df[df['TMSTAMP'] > last_update]
def write_xlsx(xlsx_file, start_row, df):
df.to_excel(xlsx_file, startrow=start_row, index=False)
dat_file = 'foo.dat'
xlsx_file = 'foo.xlsx'
start_row = 0 # change this to actual start_row when run for first time, e.g. if you have 30 recors in the xlsx file this will be 30 (first row is at index 0)
last_update = datetime.strptime('2020-03-25 21:11:00', '%Y-%m-%d %H:%M:%S') # change to last dateime already in xlsx file
while True:
df = read_data(dat_file=dat_file, last_update=last_update)
num_rows = df.shape[0] # get number of rows in dataframe
if num_rows:
print(f'Writing {num_rows} records to {xlsx_file}')
write_xlsx(xlsx_file=xlsx_file, start_row=start_row, df=df)
start_row += num_rows
last_update = df.TMSTAMP.iat[-1]
print(f'Last update in xlsx file is {last_update}, start_row is {start_row}')
else:
print('No new data id dat file.')
time.sleep(60) # wait 1 minute
|