Python Forum
Select data from between two timestamps
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Select data from between two timestamps
#1
Hi,

I am currently reading data from a .dat file as CSV.
With the script below i manage to get data from the source, but i cant get data when i select to get data between 2 timestamps.
For example data between 2020-01-20 12:00:00 and 2020-01-25 12:15:00
With the script below i get the whole datarange from the source, see second field showing the output from the script.
Maybe someone have a hint on how to accomplish that?

Thanks.

import pandas as pd
import numpy as np
import datetime as dt
import csv

df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",
                 skiprows=1,
                 index_col=['TIMESTAMP'],
                 parse_dates=True,
                 header=0,
                 usecols=['TIMESTAMP','WS_kph_Max'])

df['2020-01-20 20:59:00':'2020-01-22 21:01:00']

print(df)
Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
=========== RESTART: C:\Users\Makada\Desktop\pppooooooooooooooooo.py ===========
                          WS_kph_Max
TIMESTAMP                           
TS                   kilometers/hour
NaN                              Max
2020-01-19 20:59:00            4.609
2020-01-19 21:00:00            4.294
2020-01-19 21:01:00             4.24
...                              ...
2020-01-29 16:30:00            20.09
2020-01-29 16:31:00            30.92
2020-01-29 16:32:00            24.65
2020-01-29 16:33:00            21.34
2020-01-29 16:34:00            25.74

[14138 rows x 1 columns]
Reply
#2
here's an example to see if date lies between two others:
example: https://stackoverflow.com/a/5464467
Reply
#3
Hi,

Thanks for your reply.
But with the examples given in the link i dont know how to get it to work with my code.
My knowledge of python is limited Cry

Thanks.

I managed with the code below to get the desired dates, but the values given for that dates (WS_kph_Max) doesnt match from the source.

Print(df['2020-01-20 20:59:00':'2020-01-22 21:01:00'])
Output:

                    WS_kph_Max
TIMESTAMP                     
2020-01-20 20:59:00      0.078
2020-01-20 21:00:00      0.078
2020-01-20 21:01:00      0.078

Yes its working. Smile

Now lets see if i can get it to plot.

Thanks.
Reply
#4
I see the data type of timestamp and WS_kph_Max are "object".
Do i have to convert the timestamp to "datetime64" and WS_kph_Max to "float64" before i start trying to plot them ?

Thanks.
Reply
#5
Hi,

I have changed my code a bit, i can plot the data, but when i add code to have data plot in between 2 dates, i get an error.(for error, see second view)


import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv

from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  


df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie  - kopie.dat",skiprows=1, parse_dates=[0])

df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')

df=df.dropna()#to remove Nan values

df.info()
print(df)
print(df.dtypes)


start_date = pd.to_datetime('2020-01-19 20:59:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 23:00:00', dayfirst=True)

(df[(df.index > start_date) & (df.index > end_date)])




df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 181 entries, 2 to 182
Data columns (total 2 columns):
TIMESTAMP    181 non-null datetime64[ns]
RH_Avg       181 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 4.2 KB
              TIMESTAMP    RH_Avg
2   2020-01-19 20:59:00  87.16161
3   2020-01-19 21:00:00  87.18287
4   2020-01-19 21:01:00  87.18132
5   2020-01-19 21:02:00  87.20342
6   2020-01-19 21:03:00  87.21060
..                  ...       ...
178 2020-01-19 23:55:00  89.14359
179 2020-01-19 23:56:00  89.15077
180 2020-01-19 23:57:00  89.15542
181 2020-01-19 23:58:00  89.17583
182 2020-01-19 23:59:00  89.18188

[181 rows x 2 columns]
TIMESTAMP    datetime64[ns]
RH_Avg              float64
dtype: object
Traceback (most recent call last):
  File "C:\Users\Makada\Desktop\pppooooooooooooooooolllllllllllllllllllllllllllllllllllllll.py", line 32, in <module>
    (df[(df.index > start_date) & (df.index > end_date)])
  File "C:\Python\lib\site-packages\pandas\core\indexes\base.py", line 111, in cmp_method
    result = op(self.values, np.asarray(other))
TypeError: '>' not supported between instances of 'int' and 'Timestamp'
>>> 

Yes, got it working with code below.
Although the code is working ok, i get a warning showing text in second view below.

import pandas as pd
import numpy as np
import datetime as dt
i
mport csv
from pandas import read_csv

from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  


df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0])

df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')

df=df.dropna()#to remove Nan values

df.info()
print(df)
print(df.dtypes)


start_date = pd.to_datetime('2020-01-19 21:00:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 22:00:00', dayfirst=True)
mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
df.loc[mask]
df=df.loc[mask]


df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show()
Warning (from warnings module):
  File "<string>", line 1
DtypeWarning: Columns (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) have mixed types. Specify dtype option on import or set low_memory=False.
Reply
#6
Hi,

I solved the warning by adding "dtype='unicode'" to the import.
I think "low_memory=False" is possible also .

Thanks.
Reply
#7
Hi,

I have made some adjustments to the script so the plot is updating in the same view every minute or so.
But now the plot is updating in a new screen every minute.
Whats wrong with the code to have the plot update in its own screen?

Thanks.



import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  



df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')

df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')

df=df.dropna()#to remove Nan values

df.info()
plt.ion()
fig, ax= plt.subplots()
while(True):



    start_date = pd.to_datetime('2020-01-29 12:00:00', dayfirst=True)
    end_date = pd.to_datetime('2020-02-01 12:00:00', dayfirst=True)
    mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
    df.loc[mask]
    df=df.loc[mask]

    df.plot('TIMESTAMP','WS_kph_S_WVT',kind='line', linewidth=0.5)
    plt.ylim(0,150)
    plt.pause(10)
    plt.draw()

    plt.clf()

    print(df)
    print(df.dtypes)
Reply
#8
Hi,

I got it working with code below. Smile


import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  
import time

starttime=time.time()
plt.ion()
fig, ax=plt.subplots()
while(True):

    df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')

    df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
    df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
    df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')

    df=df.dropna()#to remove Nan values

    df.info()

    start_date = pd.to_datetime('2020-01-31 12:00:00', dayfirst=True)
    end_date = pd.to_datetime('2020-02-28 12:00:00', dayfirst=True)
    mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
    df.loc[mask]
    df=df.loc[mask]
    ax.clear()
    df.plot(x='TIMESTAMP',y='WS_kph_S_WVT',kind='line', linewidth=0.5, ax=ax)


    plt.ylim(0,150)
    plt.draw()
    plt.pause(10 - time.time() % 10)
plt.show()

print(df)
print(df.dtypes)
Reply
#9
Thank you for sharing the answer
Reply
#10
Hi, maybe not the cleanest/best/fastest script, but its working. Smile

Thanks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Select column between to dates CSV data PythonJD 0 400 Apr-14-2020, 12:22 PM
Last Post: PythonJD
  select data based on indice Staph 4 631 Jul-15-2019, 02:05 AM
Last Post: scidam
  how to select particular rows data from a array raady07 3 2,057 Mar-06-2017, 02:21 AM
Last Post: raady07

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020