Python Forum
Select data from between two timestamps
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Select data from between two timestamps
#1
Hi,

I am currently reading data from a .dat file as CSV.
With the script below i manage to get data from the source, but i cant get data when i select to get data between 2 timestamps.
For example data between 2020-01-20 12:00:00 and 2020-01-25 12:15:00
With the script below i get the whole datarange from the source, see second field showing the output from the script.
Maybe someone have a hint on how to accomplish that?

Thanks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pandas as pd
import numpy as np
import datetime as dt
import csv
 
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",
                 skiprows=1,
                 index_col=['TIMESTAMP'],
                 parse_dates=True,
                 header=0,
                 usecols=['TIMESTAMP','WS_kph_Max'])
 
df['2020-01-20 20:59:00':'2020-01-22 21:01:00']
 
print(df)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
=========== RESTART: C:\Users\Makada\Desktop\pppooooooooooooooooo.py ===========
                          WS_kph_Max
TIMESTAMP                          
TS                   kilometers/hour
NaN                              Max
2020-01-19 20:59:00            4.609
2020-01-19 21:00:00            4.294
2020-01-19 21:01:00             4.24
...                              ...
2020-01-29 16:30:00            20.09
2020-01-29 16:31:00            30.92
2020-01-29 16:32:00            24.65
2020-01-29 16:33:00            21.34
2020-01-29 16:34:00            25.74
 
[14138 rows x 1 columns]
Reply
#2
here's an example to see if date lies between two others:
example: https://stackoverflow.com/a/5464467
Reply
#3
Hi,

Thanks for your reply.
But with the examples given in the link i dont know how to get it to work with my code.
My knowledge of python is limited Cry

Thanks.

I managed with the code below to get the desired dates, but the values given for that dates (WS_kph_Max) doesnt match from the source.

1
Print(df['2020-01-20 20:59:00':'2020-01-22 21:01:00'])
Output:

1
2
3
4
5
                    WS_kph_Max
TIMESTAMP                    
2020-01-20 20:59:00      0.078
2020-01-20 21:00:00      0.078
2020-01-20 21:01:00      0.078

Yes its working. Smile

Now lets see if i can get it to plot.

Thanks.
Reply
#4
I see the data type of timestamp and WS_kph_Max are "object".
Do i have to convert the timestamp to "datetime64" and WS_kph_Max to "float64" before i start trying to plot them ?

Thanks.
Reply
#5
Hi,

I have changed my code a bit, i can plot the data, but when i add code to have data plot in between 2 dates, i get an error.(for error, see second view)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
 
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt 
 
 
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie  - kopie.dat",skiprows=1, parse_dates=[0])
 
df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')
 
df=df.dropna()#to remove Nan values
 
df.info()
print(df)
print(df.dtypes)
 
 
start_date = pd.to_datetime('2020-01-19 20:59:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 23:00:00', dayfirst=True)
 
(df[(df.index > start_date) & (df.index > end_date)])
 
 
 
 
df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<class 'pandas.core.frame.DataFrame'>
Int64Index: 181 entries, 2 to 182
Data columns (total 2 columns):
TIMESTAMP    181 non-null datetime64[ns]
RH_Avg       181 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 4.2 KB
              TIMESTAMP    RH_Avg
2   2020-01-19 20:59:00  87.16161
3   2020-01-19 21:00:00  87.18287
4   2020-01-19 21:01:00  87.18132
5   2020-01-19 21:02:00  87.20342
6   2020-01-19 21:03:00  87.21060
..                  ...       ...
178 2020-01-19 23:55:00  89.14359
179 2020-01-19 23:56:00  89.15077
180 2020-01-19 23:57:00  89.15542
181 2020-01-19 23:58:00  89.17583
182 2020-01-19 23:59:00  89.18188
 
[181 rows x 2 columns]
TIMESTAMP    datetime64[ns]
RH_Avg              float64
dtype: object
Traceback (most recent call last):
  File "C:\Users\Makada\Desktop\pppooooooooooooooooolllllllllllllllllllllllllllllllllllllll.py", line 32, in <module>
    (df[(df.index > start_date) & (df.index > end_date)])
  File "C:\Python\lib\site-packages\pandas\core\indexes\base.py", line 111, in cmp_method
    result = op(self.values, np.asarray(other))
TypeError: '>' not supported between instances of 'int' and 'Timestamp'
>>>

Yes, got it working with code below.
Although the code is working ok, i get a warning showing text in second view below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import pandas as pd
import numpy as np
import datetime as dt
i
mport csv
from pandas import read_csv
 
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt 
 
 
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0])
 
df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')
 
df=df.dropna()#to remove Nan values
 
df.info()
print(df)
print(df.dtypes)
 
 
start_date = pd.to_datetime('2020-01-19 21:00:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 22:00:00', dayfirst=True)
mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
df.loc[mask]
df=df.loc[mask]
 
 
df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show()
1
2
3
Warning (from warnings module):
  File "<string>", line 1
DtypeWarning: Columns (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) have mixed types. Specify dtype option on import or set low_memory=False.
Reply
#6
Hi,

I solved the warning by adding "dtype='unicode'" to the import.
I think "low_memory=False" is possible also .

Thanks.
Reply
#7
Hi,

I have made some adjustments to the script so the plot is updating in the same view every minute or so.
But now the plot is updating in a new screen every minute.
Whats wrong with the code to have the plot update in its own screen?

Thanks.



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt 
 
 
 
df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')
 
df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')
 
df=df.dropna()#to remove Nan values
 
df.info()
plt.ion()
fig, ax= plt.subplots()
while(True):
 
 
 
    start_date = pd.to_datetime('2020-01-29 12:00:00', dayfirst=True)
    end_date = pd.to_datetime('2020-02-01 12:00:00', dayfirst=True)
    mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
    df.loc[mask]
    df=df.loc[mask]
 
    df.plot('TIMESTAMP','WS_kph_S_WVT',kind='line', linewidth=0.5)
    plt.ylim(0,150)
    plt.pause(10)
    plt.draw()
 
    plt.clf()
 
    print(df)
    print(df.dtypes)
Reply
#8
Hi,

I got it working with code below. Smile


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt 
import time
 
starttime=time.time()
plt.ion()
fig, ax=plt.subplots()
while(True):
 
    df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')
 
    df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
    df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
    df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')
 
    df=df.dropna()#to remove Nan values
 
    df.info()
 
    start_date = pd.to_datetime('2020-01-31 12:00:00', dayfirst=True)
    end_date = pd.to_datetime('2020-02-28 12:00:00', dayfirst=True)
    mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
    df.loc[mask]
    df=df.loc[mask]
    ax.clear()
    df.plot(x='TIMESTAMP',y='WS_kph_S_WVT',kind='line', linewidth=0.5, ax=ax)
 
 
    plt.ylim(0,150)
    plt.draw()
    plt.pause(10 - time.time() % 10)
plt.show()
 
print(df)
print(df.dtypes)
Reply
#9
Thank you for sharing the answer
Reply
#10
Hi, maybe not the cleanest/best/fastest script, but its working. Smile

Thanks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Select column between to dates CSV data PythonJD 0 2,260 Apr-14-2020, 12:22 PM
Last Post: PythonJD
  select data based on indice Staph 4 3,468 Jul-15-2019, 02:05 AM
Last Post: scidam
  how to select particular rows data from a array raady07 3 5,419 Mar-06-2017, 02:21 AM
Last Post: raady07

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020