Posts: 115
Threads: 10
Joined: Nov 2019
Jan-29-2020, 08:58 PM
(This post was last modified: Jan-29-2020, 08:58 PM by Makada.)
Hi,
I am currently reading data from a .dat file as CSV.
With the script below i manage to get data from the source, but i cant get data when i select to get data between 2 timestamps.
For example data between 2020-01-20 12:00:00 and 2020-01-25 12:15:00
With the script below i get the whole datarange from the source, see second field showing the output from the script.
Maybe someone have a hint on how to accomplish that?
Thanks.
import pandas as pd
import numpy as np
import datetime as dt
import csv
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",
skiprows=1,
index_col=['TIMESTAMP'],
parse_dates=True,
header=0,
usecols=['TIMESTAMP','WS_kph_Max'])
df['2020-01-20 20:59:00':'2020-01-22 21:01:00']
print(df) Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
=========== RESTART: C:\Users\Makada\Desktop\pppooooooooooooooooo.py ===========
WS_kph_Max
TIMESTAMP
TS kilometers/hour
NaN Max
2020-01-19 20:59:00 4.609
2020-01-19 21:00:00 4.294
2020-01-19 21:01:00 4.24
... ...
2020-01-29 16:30:00 20.09
2020-01-29 16:31:00 30.92
2020-01-29 16:32:00 24.65
2020-01-29 16:33:00 21.34
2020-01-29 16:34:00 25.74
[14138 rows x 1 columns]
Posts: 12,039
Threads: 487
Joined: Sep 2016
here's an example to see if date lies between two others:
example: https://stackoverflow.com/a/5464467
Posts: 115
Threads: 10
Joined: Nov 2019
Jan-30-2020, 08:35 AM
(This post was last modified: Jan-30-2020, 09:27 AM by Makada.)
Hi,
Thanks for your reply.
But with the examples given in the link i dont know how to get it to work with my code.
My knowledge of python is limited
Thanks.
I managed with the code below to get the desired dates, but the values given for that dates (WS_kph_Max) doesnt match from the source.
Print(df['2020-01-20 20:59:00':'2020-01-22 21:01:00']) Output:
WS_kph_Max
TIMESTAMP
2020-01-20 20:59:00 0.078
2020-01-20 21:00:00 0.078
2020-01-20 21:01:00 0.078
Yes its working.
Now lets see if i can get it to plot.
Thanks.
Posts: 115
Threads: 10
Joined: Nov 2019
Jan-30-2020, 08:15 PM
(This post was last modified: Jan-30-2020, 08:15 PM by Makada.)
I see the data type of timestamp and WS_kph_Max are "object".
Do i have to convert the timestamp to "datetime64" and WS_kph_Max to "float64" before i start trying to plot them ?
Thanks.
Posts: 115
Threads: 10
Joined: Nov 2019
Jan-31-2020, 09:48 AM
(This post was last modified: Jan-31-2020, 10:46 AM by Makada.)
Hi,
I have changed my code a bit, i can plot the data, but when i add code to have data plot in between 2 dates, i get an error.(for error, see second view)
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie - kopie.dat",skiprows=1, parse_dates=[0])
df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')
df=df.dropna()#to remove Nan values
df.info()
print(df)
print(df.dtypes)
start_date = pd.to_datetime('2020-01-19 20:59:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 23:00:00', dayfirst=True)
(df[(df.index > start_date) & (df.index > end_date)])
df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show() <class 'pandas.core.frame.DataFrame'>
Int64Index: 181 entries, 2 to 182
Data columns (total 2 columns):
TIMESTAMP 181 non-null datetime64[ns]
RH_Avg 181 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 4.2 KB
TIMESTAMP RH_Avg
2 2020-01-19 20:59:00 87.16161
3 2020-01-19 21:00:00 87.18287
4 2020-01-19 21:01:00 87.18132
5 2020-01-19 21:02:00 87.20342
6 2020-01-19 21:03:00 87.21060
.. ... ...
178 2020-01-19 23:55:00 89.14359
179 2020-01-19 23:56:00 89.15077
180 2020-01-19 23:57:00 89.15542
181 2020-01-19 23:58:00 89.17583
182 2020-01-19 23:59:00 89.18188
[181 rows x 2 columns]
TIMESTAMP datetime64[ns]
RH_Avg float64
dtype: object
Traceback (most recent call last):
File "C:\Users\Makada\Desktop\pppooooooooooooooooolllllllllllllllllllllllllllllllllllllll.py", line 32, in <module>
(df[(df.index > start_date) & (df.index > end_date)])
File "C:\Python\lib\site-packages\pandas\core\indexes\base.py", line 111, in cmp_method
result = op(self.values, np.asarray(other))
TypeError: '>' not supported between instances of 'int' and 'Timestamp'
>>>
Yes, got it working with code below.
Although the code is working ok, i get a warning showing text in second view below.
import pandas as pd
import numpy as np
import datetime as dt
i
mport csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0])
df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')
df=df.dropna()#to remove Nan values
df.info()
print(df)
print(df.dtypes)
start_date = pd.to_datetime('2020-01-19 21:00:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 22:00:00', dayfirst=True)
mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
df.loc[mask]
df=df.loc[mask]
df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show() Warning (from warnings module):
File "<string>", line 1
DtypeWarning: Columns (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) have mixed types. Specify dtype option on import or set low_memory=False.
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
I solved the warning by adding "dtype='unicode'" to the import.
I think "low_memory=False" is possible also .
Thanks.
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
I have made some adjustments to the script so the plot is updating in the same view every minute or so.
But now the plot is updating in a new screen every minute.
Whats wrong with the code to have the plot update in its own screen?
Thanks.
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')
df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')
df=df.dropna()#to remove Nan values
df.info()
plt.ion()
fig, ax= plt.subplots()
while(True):
start_date = pd.to_datetime('2020-01-29 12:00:00', dayfirst=True)
end_date = pd.to_datetime('2020-02-01 12:00:00', dayfirst=True)
mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
df.loc[mask]
df=df.loc[mask]
df.plot('TIMESTAMP','WS_kph_S_WVT',kind='line', linewidth=0.5)
plt.ylim(0,150)
plt.pause(10)
plt.draw()
plt.clf()
print(df)
print(df.dtypes)
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
I got it working with code below.
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
import time
starttime=time.time()
plt.ion()
fig, ax=plt.subplots()
while(True):
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')
df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')
df=df.dropna()#to remove Nan values
df.info()
start_date = pd.to_datetime('2020-01-31 12:00:00', dayfirst=True)
end_date = pd.to_datetime('2020-02-28 12:00:00', dayfirst=True)
mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
df.loc[mask]
df=df.loc[mask]
ax.clear()
df.plot(x='TIMESTAMP',y='WS_kph_S_WVT',kind='line', linewidth=0.5, ax=ax)
plt.ylim(0,150)
plt.draw()
plt.pause(10 - time.time() % 10)
plt.show()
print(df)
print(df.dtypes)
Posts: 12,039
Threads: 487
Joined: Sep 2016
Thank you for sharing the answer
Posts: 115
Threads: 10
Joined: Nov 2019
Hi, maybe not the cleanest/best/fastest script, but its working.
Thanks.
|