Select data from between two timestamps

Makada · (This post was last modified: Jan-29-2020, 08:58 PM by Makada.)

Hi,

I am currently reading data from a .dat file as CSV.
With the script below i manage to get data from the source, but i cant get data when i select to get data between 2 timestamps.
For example data between 2020-01-20 12:00:00 and 2020-01-25 12:15:00
With the script below i get the whole datarange from the source, see second field showing the output from the script.
Maybe someone have a hint on how to accomplish that?

Thanks.

import pandas as pd
import numpy as np
import datetime as dt
import csv

df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",
                 skiprows=1,
                 index_col=['TIMESTAMP'],
                 parse_dates=True,
                 header=0,
                 usecols=['TIMESTAMP','WS_kph_Max'])

df['2020-01-20 20:59:00':'2020-01-22 21:01:00']

print(df)

Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
=========== RESTART: C:\Users\Makada\Desktop\pppooooooooooooooooo.py ===========
                          WS_kph_Max
TIMESTAMP                           
TS                   kilometers/hour
NaN                              Max
2020-01-19 20:59:00            4.609
2020-01-19 21:00:00            4.294
2020-01-19 21:01:00             4.24
...                              ...
2020-01-29 16:30:00            20.09
2020-01-29 16:31:00            30.92
2020-01-29 16:32:00            24.65
2020-01-29 16:33:00            21.34
2020-01-29 16:34:00            25.74

[14138 rows x 1 columns]

**Larz60+** · Jan-30-2020, 01:51 AM

here's an example to see if date lies between two others:
example: https://stackoverflow.com/a/5464467

Makada · (This post was last modified: Jan-30-2020, 09:27 AM by Makada.)

Hi,

Thanks for your reply.
But with the examples given in the link i dont know how to get it to work with my code.
My knowledge of python is limited Cry

Thanks.

I managed with the code below to get the desired dates, but the values given for that dates (WS_kph_Max) doesnt match from the source.

Print(df['2020-01-20 20:59:00':'2020-01-22 21:01:00'])

Output:

                    WS_kph_Max
TIMESTAMP                     
2020-01-20 20:59:00      0.078
2020-01-20 21:00:00      0.078
2020-01-20 21:01:00      0.078

Yes its working. Smile

Now lets see if i can get it to plot.

Thanks.

Makada · (This post was last modified: Jan-30-2020, 08:15 PM by Makada.)

I see the data type of timestamp and WS_kph_Max are "object".
Do i have to convert the timestamp to "datetime64" and WS_kph_Max to "float64" before i start trying to plot them ?

Thanks.

Makada · (This post was last modified: Jan-31-2020, 10:46 AM by Makada.)

Hi,

I have changed my code a bit, i can plot the data, but when i add code to have data plot in between 2 dates, i get an error.(for error, see second view)

import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv

from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  


df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie  - kopie.dat",skiprows=1, parse_dates=[0])

df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')

df=df.dropna()#to remove Nan values

df.info()
print(df)
print(df.dtypes)


start_date = pd.to_datetime('2020-01-19 20:59:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 23:00:00', dayfirst=True)

(df[(df.index > start_date) & (df.index > end_date)])




df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 181 entries, 2 to 182
Data columns (total 2 columns):
TIMESTAMP    181 non-null datetime64[ns]
RH_Avg       181 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 4.2 KB
              TIMESTAMP    RH_Avg
2   2020-01-19 20:59:00  87.16161
3   2020-01-19 21:00:00  87.18287
4   2020-01-19 21:01:00  87.18132
5   2020-01-19 21:02:00  87.20342
6   2020-01-19 21:03:00  87.21060
..                  ...       ...
178 2020-01-19 23:55:00  89.14359
179 2020-01-19 23:56:00  89.15077
180 2020-01-19 23:57:00  89.15542
181 2020-01-19 23:58:00  89.17583
182 2020-01-19 23:59:00  89.18188

[181 rows x 2 columns]
TIMESTAMP    datetime64[ns]
RH_Avg              float64
dtype: object
Traceback (most recent call last):
  File "C:\Users\Makada\Desktop\pppooooooooooooooooolllllllllllllllllllllllllllllllllllllll.py", line 32, in <module>
    (df[(df.index > start_date) & (df.index > end_date)])
  File "C:\Python\lib\site-packages\pandas\core\indexes\base.py", line 111, in cmp_method
    result = op(self.values, np.asarray(other))
TypeError: '>' not supported between instances of 'int' and 'Timestamp'
>>>

Yes, got it working with code below.
Although the code is working ok, i get a warning showing text in second view below.

import pandas as pd
import numpy as np
import datetime as dt
i
mport csv
from pandas import read_csv

from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  


df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0])

df=pd.DataFrame(df, columns=['TIMESTAMP','RH_Avg'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['RH_Avg']=pd.to_numeric(df.RH_Avg, errors='coerce')

df=df.dropna()#to remove Nan values

df.info()
print(df)
print(df.dtypes)


start_date = pd.to_datetime('2020-01-19 21:00:00', dayfirst=True)
end_date = pd.to_datetime('2020-01-19 22:00:00', dayfirst=True)
mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
df.loc[mask]
df=df.loc[mask]


df.plot('TIMESTAMP','RH_Avg',kind='line', linewidth=0.5)
plt.show()

Warning (from warnings module):
  File "<string>", line 1
DtypeWarning: Columns (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) have mixed types. Specify dtype option on import or set low_memory=False.

Makada · Jan-31-2020, 10:53 AM

Hi,

I solved the warning by adding "dtype='unicode'" to the import.
I think "low_memory=False" is possible also .

Thanks.

Makada · Jan-31-2020, 07:05 PM

Hi,

I have made some adjustments to the script so the plot is updating in the same view every minute or so.
But now the plot is updating in a new screen every minute.
Whats wrong with the code to have the plot update in its own screen?

Thanks.

import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  



df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')

df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')

df=df.dropna()#to remove Nan values

df.info()
plt.ion()
fig, ax= plt.subplots()
while(True):



    start_date = pd.to_datetime('2020-01-29 12:00:00', dayfirst=True)
    end_date = pd.to_datetime('2020-02-01 12:00:00', dayfirst=True)
    mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
    df.loc[mask]
    df=df.loc[mask]

    df.plot('TIMESTAMP','WS_kph_S_WVT',kind='line', linewidth=0.5)
    plt.ylim(0,150)
    plt.pause(10)
    plt.draw()

    plt.clf()

    print(df)
    print(df.dtypes)

Makada · Jan-31-2020, 09:05 PM

Hi,

I got it working with code below. Smile

import pandas as pd
import numpy as np
import datetime as dt
import csv
from pandas import read_csv
from datetime import datetime
import matplotlib
import matplotlib.pyplot as plt  
import time

starttime=time.time()
plt.ion()
fig, ax=plt.subplots()
while(True):

    df = pd.read_csv("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie .dat",skiprows=1, parse_dates=[0], dtype='unicode')

    df=pd.DataFrame(df, columns=['TIMESTAMP','WS_kph_S_WVT'])
    df['TIMESTAMP']=pd.to_datetime(df.TIMESTAMP, errors='coerce')
    df['WS_kph_S_WVT']=pd.to_numeric(df.WS_kph_S_WVT, errors='coerce')

    df=df.dropna()#to remove Nan values

    df.info()

    start_date = pd.to_datetime('2020-01-31 12:00:00', dayfirst=True)
    end_date = pd.to_datetime('2020-02-28 12:00:00', dayfirst=True)
    mask=(df['TIMESTAMP'] > start_date) & (df['TIMESTAMP'] <= end_date)
    df.loc[mask]
    df=df.loc[mask]
    ax.clear()
    df.plot(x='TIMESTAMP',y='WS_kph_S_WVT',kind='line', linewidth=0.5, ax=ax)


    plt.ylim(0,150)
    plt.draw()
    plt.pause(10 - time.time() % 10)
plt.show()

print(df)
print(df.dtypes)

**Larz60+** · Jan-31-2020, 09:32 PM

Thank you for sharing the answer

Makada · Jan-31-2020, 09:44 PM

Hi, maybe not the cleanest/best/fastest script, but its working. Smile

Thanks.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Select column between to dates CSV data	PythonJD	0	2,328	Apr-14-2020, 12:22 PM Last Post: PythonJD
	select data based on indice	Staph	4	3,584	Jul-15-2019, 02:05 AM Last Post: scidam
	how to select particular rows data from a array	raady07	3	5,554	Mar-06-2017, 02:21 AM Last Post: raady07

Select data from between two timestamps

User Panel Messages

Announcements