Python Forum
Visualisation of gaps in time series data - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Visualisation of gaps in time series data (/thread-3770.html)

Pages: 1 2


Visualisation of gaps in time series data - ulrich48155 - Jun-22-2017

From my university project I got time series data. Unfortunately, my time series has some gaps due to technical issues. Is it possible to visualise these gaps?

I was thinking of something like this:
http://imgur.com/a/oe1dS

The beige gaps represent the missing data. The best case would be, if the axis is even labeled so that one can excatly see, with which timestamp a gap starts and with which it ends.

This is an example how these timestamps look like:
2017-02-11 22:05:33.982497
2017-02-12 10:17:28.660385
2017-02-12 13:20:23.416498
2017-02-12 16:23:13.309596
2017-02-13 01:32:35.242695

Thanks in advance!


RE: Visualisation of gaps in time series data - micseydel - Jun-22-2017

Could you give an example of what the gaps in that example list of timestamps would be?


RE: Visualisation of gaps in time series data - ulrich48155 - Jun-22-2017

Oh yeah, sorry! As this is data mined from Twitter, there are 'normal' gaps between every timestamp of around 3h. These can be neglected. The gabs I mean are those spanning over days or even weeks like this:

2017-02-15 01:11:43.345424
2017-02-15 04:15:37.635750
2017-02-15 07:19:21.527454
2017-03-15 23:01:23.013933
2017-03-16 02:10:28.695685
2017-03-16 05:19:39.172491
2017-03-16 08:29:19.694782
2017-03-16 11:39:17.936486
2017-03-16 14:49:12.238304


RE: Visualisation of gaps in time series data - micseydel - Jun-22-2017

Let me put it this way - I see two parts to this. (1) Processing the raw data into some representation of "windows". (2) Visualizing that programmatic representation.

Which one (or both) are you having trouble with? I recommend you focus on (1) first, neglecting (2) entirely, and then tackle (2) once (1) is completed. If you are struggling with (1) then showing us what you have so far* will help us to help you.

* "showing us what you have so far" = a minimal code snippet (won't include anything for (2)) that runs, and reproduces whatever issue is blocking you, be it undesirable output or an error message.


RE: Visualisation of gaps in time series data - ulrich48155 - Jun-23-2017

To be quite honest I dont have anything yet except for loading my .csv file as I got no clue how and where to start. Apart from doing some easy plotting with matplotlib I dont have any experience with data visualisation.


RE: Visualisation of gaps in time series data - zivoni - Jun-23-2017

Data preprocessing in this case could be "filling" your timeseries such that it contains all "expected" timestamps.
Simple example with daily data timeserie with single gap:
Output:
2017-05-01  10 2017-05-02  12 2017-05-04  11
You can create new timeserie by "adding" missing dates, while converting values to flags indicating values in original timeserie.
Output:
2017-05-01  True 2017-05-02  True 2017-05-03  False 2017-05-04  True
You can get plot similar to your example plot by plotting barchart based on this data.

With your data you would probably need to "truncate" your timestamps to 3hours periods and fill it again with 3hours periods. Pandas provides lots of time related functions, check Series.asfreq() and PeriodIndex().


RE: Visualisation of gaps in time series data - ulrich48155 - Jun-24-2017

Thanks for your help so far! After your comments I came up with this solution:

import pandas as pd
import missingno as mn

colnames=['Timestamp','Currency','Rate','Volume']
usecols=['Timestamp','Rate']
series=pd.read_csv('prices.csv', names=colnames, usecols=usecols)

series.Timestamp = pd.to_datetime(series.Timestamp)
series = series.set_index('Timestamp')

series.index=series.index.round('min')

series=series.reindex(pd.date_range(start=series.index[0], end=series.index[-1], freq='1min'))

series=series.isnull()

print(series)
Output:
... 2017-02-11 20:46:00  False 2017-02-11 20:47:00  False 2017-02-11 20:48:00  False 2017-02-11 20:49:00   True 2017-02-11 20:50:00  False ...
Due to the fact that the differences between the timestamps are not completly regular, I got some small gaps in my output data.

To visualize my data I found missingno. Unfortunately, there are not many description or tutorials how to use the package. If I just use the matrix function like this:

mn.matrix(series)
My output looks like this:

[Image: 88l58]


Does anyone know how to specify my coding in order to get a result, which has my timestamps an the y-axis so that the gaps in my data are visible? If not, is there maybe another approach to visualize these gaps?

Thanks in advance!


RE: Visualisation of gaps in time series data - zivoni - Jun-24-2017

You can take care of irregular difference between timestamps by some truncating or "binning". Example on your timestamps (with fake values):
Output:
In [12]: tserie Out[12]: 2017-02-15 01:11:43.345424    0 2017-02-15 04:15:37.635750    1 2017-02-15 07:19:21.527454    2 2017-03-15 23:01:23.013933    3 2017-03-16 02:10:28.695685    4 2017-03-16 05:19:39.172491    5 2017-03-16 08:29:19.694782    6 2017-03-16 11:39:17.936486    7 2017-03-16 14:49:12.238304    8 dtype: int64
Converting to 3H periods and filling (it works quite well here, but reindex/asfreq/PeriodDate could be more "universal").
Output:
In [13]: missing = tserie.resample('3H').min().notnull() In [14]: missing Out[14]: 2017-02-15 00:00:00     True 2017-02-15 03:00:00     True 2017-02-15 06:00:00     True 2017-02-15 09:00:00    False 2017-02-15 12:00:00    False 2017-02-15 15:00:00    False                       ... 2017-03-15 15:00:00    False 2017-03-15 18:00:00    False 2017-03-15 21:00:00     True 2017-03-16 00:00:00     True 2017-03-16 03:00:00     True 2017-03-16 06:00:00     True 2017-03-16 09:00:00     True 2017-03-16 12:00:00     True Freq: 3H, Length: 237, dtype: bool
Plotting as barchart with colors according to "missing flag"
In [15]: plt.bar(missing.index, np.ones(len(missing)), color = [['y','b'][idx] for idx in missing], width = 1)
gives
[Image: FcubsL6.png]


RE: Visualisation of gaps in time series data - ulrich48155 - Jun-25-2017

Thanks again. Now my code looks like this:

import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

colnames=['Timestamp','Currency','Rate','Volume']
usecols=['Timestamp','Currency']
series=pd.read_csv('prices.csv', names=colnames, usecols=usecols)

series.Timestamp = pd.to_datetime(series.Timestamp)

series = series.set_index('Timestamp')

#I actually got two different datasets with different time intervals (one with 3h and one with 1min)
missing = series.resample('1min').min().notnull() 

plt.bar(missing.index, np.ones(len(missing)), color = [['y','b'][idx] for idx in missing], width = 1)
But now I get this error message:

Error:
TypeError: list indices must be integers or slices, not str
What do I have to convert to integer now?


RE: Visualisation of gaps in time series data - zivoni - Jun-25-2017

Values of missing should be True/False (special case of int), so basically you should be indexing with 1 or 0.