Posts: 115
Threads: 10
Joined: Nov 2019
Jan-29-2020, 08:58 PM
(This post was last modified: Jan-29-2020, 08:58 PM by Makada.)
Hi,
I am currently reading data from a .dat file as CSV.
With the script below i manage to get data from the source, but i cant get data when i select to get data between 2 timestamps.
For example data between 2020-01-20 12:00:00 and 2020-01-25 12:15:00
With the script below i get the whole datarange from the source, see second field showing the output from the script.
Maybe someone have a hint on how to accomplish that?
Thanks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import pandas as pd
import numpy as np
import datetime as dt
import csv
df = pd.read_csv( "C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat" ,
skiprows = 1 ,
index_col = [ 'TIMESTAMP' ],
parse_dates = True ,
header = 0 ,
usecols = [ 'TIMESTAMP' , 'WS_kph_Max' ])
df[ '2020-01-20 20:59:00' : '2020-01-22 21:01:00' ]
print (df)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Python 3.8 . 1 (tags / v3. 8.1 : 1b293b6 , Dec 18 2019 , 22 : 39 : 24 ) [MSC v. 1916 32 bit (Intel)] on win32
Type "help" , "copyright" , "credits" or "license()" for more information.
>>>
= = = = = = = = = = = RESTART: C:\Users\Makada\Desktop\pppooooooooooooooooo.py = = = = = = = = = = =
WS_kph_Max
TIMESTAMP
TS kilometers / hour
NaN Max
2020 - 01 - 19 20 : 59 : 00 4.609
2020 - 01 - 19 21 : 00 : 00 4.294
2020 - 01 - 19 21 : 01 : 00 4.24
... ...
2020 - 01 - 29 16 : 30 : 00 20.09
2020 - 01 - 29 16 : 31 : 00 30.92
2020 - 01 - 29 16 : 32 : 00 24.65
2020 - 01 - 29 16 : 33 : 00 21.34
2020 - 01 - 29 16 : 34 : 00 25.74
[ 14138 rows x 1 columns]
|
Posts: 12,038
Threads: 487
Joined: Sep 2016
here's an example to see if date lies between two others:
example: https://stackoverflow.com/a/5464467
Posts: 115
Threads: 10
Joined: Nov 2019
Jan-30-2020, 08:35 AM
(This post was last modified: Jan-30-2020, 09:27 AM by Makada.)
Hi,
Thanks for your reply.
But with the examples given in the link i dont know how to get it to work with my code.
My knowledge of python is limited
Thanks.
I managed with the code below to get the desired dates, but the values given for that dates (WS_kph_Max) doesnt match from the source.
1 |
Print (df[ '2020-01-20 20:59:00' : '2020-01-22 21:01:00' ])
|
Output:
1 2 3 4 5 |
WS_kph_Max
TIMESTAMP
2020 - 01 - 20 20 : 59 : 00 0.078
2020 - 01 - 20 21 : 00 : 00 0.078
2020 - 01 - 20 21 : 01 : 00 0.078
|
Yes its working.
Now lets see if i can get it to plot.
Thanks.
Posts: 115
Threads: 10
Joined: Nov 2019
Jan-30-2020, 08:15 PM
(This post was last modified: Jan-30-2020, 08:15 PM by Makada.)
I see the data type of timestamp and WS_kph_Max are "object".
Do i have to convert the timestamp to "datetime64" and WS_kph_Max to "float64" before i start trying to plot them ?
Thanks.
Posts: 115
Threads: 10
Joined: Nov 2019
Jan-31-2020, 09:48 AM
(This post was last modified: Jan-31-2020, 10:46 AM by Makada.)
Hi,
I have changed my code a bit, i can plot the data, but when i add code to have data plot in between 2 dates, i get an error.(for error, see second view)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv( "C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie - kopie.dat" ,skiprows = 1 , parse_dates = [ 0 ])
df = pd.DataFrame(df, columns = [ 'TIMESTAMP' , 'RH_Avg' ])
df[ 'TIMESTAMP' ] = pd.to_datetime(df.TIMESTAMP, errors = 'coerce' )
df[ 'RH_Avg' ] = pd.to_numeric(df.RH_Avg, errors = 'coerce' )
df = df.dropna()
df.info()
print (df)
print (df.dtypes)
start_date = pd.to_datetime( '2020-01-19 20:59:00' , dayfirst = True )
end_date = pd.to_datetime( '2020-01-19 23:00:00' , dayfirst = True )
(df[(df.index > start_date) & (df.index > end_date)])
df.plot( 'TIMESTAMP' , 'RH_Avg' ,kind = 'line' , linewidth = 0.5 )
plt.show()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
< class 'pandas.core.frame.DataFrame' >
Int64Index: 181 entries, 2 to 182
Data columns (total 2 columns):
TIMESTAMP 181 non - null datetime64[ns]
RH_Avg 181 non - null float64
dtypes: datetime64[ns]( 1 ), float64( 1 )
memory usage: 4.2 KB
TIMESTAMP RH_Avg
2 2020 - 01 - 19 20 : 59 : 00 87.16161
3 2020 - 01 - 19 21 : 00 : 00 87.18287
4 2020 - 01 - 19 21 : 01 : 00 87.18132
5 2020 - 01 - 19 21 : 02 : 00 87.20342
6 2020 - 01 - 19 21 : 03 : 00 87.21060
.. ... ...
178 2020 - 01 - 19 23 : 55 : 00 89.14359
179 2020 - 01 - 19 23 : 56 : 00 89.15077
180 2020 - 01 - 19 23 : 57 : 00 89.15542
181 2020 - 01 - 19 23 : 58 : 00 89.17583
182 2020 - 01 - 19 23 : 59 : 00 89.18188
[ 181 rows x 2 columns]
TIMESTAMP datetime64[ns]
RH_Avg float64
dtype: object
Traceback (most recent call last):
File "C:\Users\Makada\Desktop\pppooooooooooooooooolllllllllllllllllllllllllllllllllllllll.py" , line 32 , in <module>
(df[(df.index > start_date) & (df.index > end_date)])
File "C:\Python\lib\site-packages\pandas\core\indexes\base.py" , line 111 , in cmp_method
result = op( self .values, np.asarray(other))
TypeError: '>' not supported between instances of 'int' and 'Timestamp'
>>>
|
Yes, got it working with code below.
Although the code is working ok, i get a warning showing text in second view below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import pandas as pd
import numpy as np
import datetime as dt
i
mport csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv( "C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat" ,skiprows = 1 , parse_dates = [ 0 ])
df = pd.DataFrame(df, columns = [ 'TIMESTAMP' , 'RH_Avg' ])
df[ 'TIMESTAMP' ] = pd.to_datetime(df.TIMESTAMP, errors = 'coerce' )
df[ 'RH_Avg' ] = pd.to_numeric(df.RH_Avg, errors = 'coerce' )
df = df.dropna()
df.info()
print (df)
print (df.dtypes)
start_date = pd.to_datetime( '2020-01-19 21:00:00' , dayfirst = True )
end_date = pd.to_datetime( '2020-01-19 22:00:00' , dayfirst = True )
mask = (df[ 'TIMESTAMP' ] > start_date) & (df[ 'TIMESTAMP' ] < = end_date)
df.loc[mask]
df = df.loc[mask]
df.plot( 'TIMESTAMP' , 'RH_Avg' ,kind = 'line' , linewidth = 0.5 )
plt.show()
|
1 2 3 |
Warning ( from warnings module):
File "<string>" , line 1
DtypeWarning: Columns ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 ) have mixed types. Specify dtype option on import or set low_memory = False .
|
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
I solved the warning by adding "dtype='unicode'" to the import.
I think "low_memory=False" is possible also .
Thanks.
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
I have made some adjustments to the script so the plot is updating in the same view every minute or so.
But now the plot is updating in a new screen every minute.
Whats wrong with the code to have the plot update in its own screen?
Thanks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
df = pd.read_csv( "C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat" ,skiprows = 1 , parse_dates = [ 0 ], dtype = 'unicode' )
df = pd.DataFrame(df, columns = [ 'TIMESTAMP' , 'WS_kph_S_WVT' ])
df[ 'TIMESTAMP' ] = pd.to_datetime(df.TIMESTAMP, errors = 'coerce' )
df[ 'WS_kph_S_WVT' ] = pd.to_numeric(df.WS_kph_S_WVT, errors = 'coerce' )
df = df.dropna()
df.info()
plt.ion()
fig, ax = plt.subplots()
while ( True ):
start_date = pd.to_datetime( '2020-01-29 12:00:00' , dayfirst = True )
end_date = pd.to_datetime( '2020-02-01 12:00:00' , dayfirst = True )
mask = (df[ 'TIMESTAMP' ] > start_date) & (df[ 'TIMESTAMP' ] < = end_date)
df.loc[mask]
df = df.loc[mask]
df.plot( 'TIMESTAMP' , 'WS_kph_S_WVT' ,kind = 'line' , linewidth = 0.5 )
plt.ylim( 0 , 150 )
plt.pause( 10 )
plt.draw()
plt.clf()
print (df)
print (df.dtypes)
|
Posts: 115
Threads: 10
Joined: Nov 2019
Hi,
I got it working with code below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt
import time
starttime = time.time()
plt.ion()
fig, ax = plt.subplots()
while ( True ):
df = pd.read_csv( "C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat" ,skiprows = 1 , parse_dates = [ 0 ], dtype = 'unicode' )
df = pd.DataFrame(df, columns = [ 'TIMESTAMP' , 'WS_kph_S_WVT' ])
df[ 'TIMESTAMP' ] = pd.to_datetime(df.TIMESTAMP, errors = 'coerce' )
df[ 'WS_kph_S_WVT' ] = pd.to_numeric(df.WS_kph_S_WVT, errors = 'coerce' )
df = df.dropna()
df.info()
start_date = pd.to_datetime( '2020-01-31 12:00:00' , dayfirst = True )
end_date = pd.to_datetime( '2020-02-28 12:00:00' , dayfirst = True )
mask = (df[ 'TIMESTAMP' ] > start_date) & (df[ 'TIMESTAMP' ] < = end_date)
df.loc[mask]
df = df.loc[mask]
ax.clear()
df.plot(x = 'TIMESTAMP' ,y = 'WS_kph_S_WVT' ,kind = 'line' , linewidth = 0.5 , ax = ax)
plt.ylim( 0 , 150 )
plt.draw()
plt.pause( 10 - time.time() % 10 )
plt.show()
print (df)
print (df.dtypes)
|
Posts: 12,038
Threads: 487
Joined: Sep 2016
Thank you for sharing the answer
Posts: 115
Threads: 10
Joined: Nov 2019
Hi, maybe not the cleanest/best/fastest script, but its working.
Thanks.
|