Visualisation of gaps in time series data - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Homework (https://python-forum.io/forum-9.html) +--- Thread: Visualisation of gaps in time series data (/thread-3770.html) Pages:
1
2
|
Visualisation of gaps in time series data - ulrich48155 - Jun-22-2017 From my university project I got time series data. Unfortunately, my time series has some gaps due to technical issues. Is it possible to visualise these gaps? I was thinking of something like this: http://imgur.com/a/oe1dS The beige gaps represent the missing data. The best case would be, if the axis is even labeled so that one can excatly see, with which timestamp a gap starts and with which it ends. This is an example how these timestamps look like: 2017-02-11 22:05:33.982497 2017-02-12 10:17:28.660385 2017-02-12 13:20:23.416498 2017-02-12 16:23:13.309596 2017-02-13 01:32:35.242695 Thanks in advance! RE: Visualisation of gaps in time series data - micseydel - Jun-22-2017 Could you give an example of what the gaps in that example list of timestamps would be? RE: Visualisation of gaps in time series data - ulrich48155 - Jun-22-2017 Oh yeah, sorry! As this is data mined from Twitter, there are 'normal' gaps between every timestamp of around 3h. These can be neglected. The gabs I mean are those spanning over days or even weeks like this: 2017-02-15 01:11:43.345424 2017-02-15 04:15:37.635750 2017-02-15 07:19:21.527454 2017-03-15 23:01:23.013933 2017-03-16 02:10:28.695685 2017-03-16 05:19:39.172491 2017-03-16 08:29:19.694782 2017-03-16 11:39:17.936486 2017-03-16 14:49:12.238304 RE: Visualisation of gaps in time series data - micseydel - Jun-22-2017 Let me put it this way - I see two parts to this. (1) Processing the raw data into some representation of "windows". (2) Visualizing that programmatic representation. Which one (or both) are you having trouble with? I recommend you focus on (1) first, neglecting (2) entirely, and then tackle (2) once (1) is completed. If you are struggling with (1) then showing us what you have so far* will help us to help you. * "showing us what you have so far" = a minimal code snippet (won't include anything for (2)) that runs, and reproduces whatever issue is blocking you, be it undesirable output or an error message. RE: Visualisation of gaps in time series data - ulrich48155 - Jun-23-2017 To be quite honest I dont have anything yet except for loading my .csv file as I got no clue how and where to start. Apart from doing some easy plotting with matplotlib I dont have any experience with data visualisation. RE: Visualisation of gaps in time series data - zivoni - Jun-23-2017 Data preprocessing in this case could be "filling" your timeseries such that it contains all "expected" timestamps. Simple example with daily data timeserie with single gap: You can create new timeserie by "adding" missing dates, while converting values to flags indicating values in original timeserie. You can get plot similar to your example plot by plotting barchart based on this data.With your data you would probably need to "truncate" your timestamps to 3hours periods and fill it again with 3hours periods. Pandas provides lots of time related functions, check Series.asfreq() and PeriodIndex(). RE: Visualisation of gaps in time series data - ulrich48155 - Jun-24-2017 Thanks for your help so far! After your comments I came up with this solution: import pandas as pd import missingno as mn colnames=['Timestamp','Currency','Rate','Volume'] usecols=['Timestamp','Rate'] series=pd.read_csv('prices.csv', names=colnames, usecols=usecols) series.Timestamp = pd.to_datetime(series.Timestamp) series = series.set_index('Timestamp') series.index=series.index.round('min') series=series.reindex(pd.date_range(start=series.index[0], end=series.index[-1], freq='1min')) series=series.isnull() print(series) Due to the fact that the differences between the timestamps are not completly regular, I got some small gaps in my output data.To visualize my data I found missingno. Unfortunately, there are not many description or tutorials how to use the package. If I just use the matrix function like this: mn.matrix(series)My output looks like this: [Image: 88l58] Does anyone know how to specify my coding in order to get a result, which has my timestamps an the y-axis so that the gaps in my data are visible? If not, is there maybe another approach to visualize these gaps? Thanks in advance! RE: Visualisation of gaps in time series data - zivoni - Jun-24-2017 You can take care of irregular difference between timestamps by some truncating or "binning". Example on your timestamps (with fake values): Converting to 3H periods and filling (it works quite well here, but reindex/asfreq/PeriodDate could be more "universal"). Plotting as barchart with colors according to "missing flag"In [15]: plt.bar(missing.index, np.ones(len(missing)), color = [['y','b'][idx] for idx in missing], width = 1)gives [Image: FcubsL6.png] RE: Visualisation of gaps in time series data - ulrich48155 - Jun-25-2017 Thanks again. Now my code looks like this: import pandas as pd from matplotlib import pyplot as plt import numpy as np colnames=['Timestamp','Currency','Rate','Volume'] usecols=['Timestamp','Currency'] series=pd.read_csv('prices.csv', names=colnames, usecols=usecols) series.Timestamp = pd.to_datetime(series.Timestamp) series = series.set_index('Timestamp') #I actually got two different datasets with different time intervals (one with 3h and one with 1min) missing = series.resample('1min').min().notnull() plt.bar(missing.index, np.ones(len(missing)), color = [['y','b'][idx] for idx in missing], width = 1)But now I get this error message: What do I have to convert to integer now?
RE: Visualisation of gaps in time series data - zivoni - Jun-25-2017 Values of missing should be True/False (special case of int), so basically you should be indexing with 1 or 0. |