Time series - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Time series (/thread-10528.html) |
Time series - honda_933 - May-24-2018 Hi I'm trying to find a simple way to run through a large data set (2432094 lines). I'm currently writing a program that takes the first 480 lines (8 hours) and creates a new array. Then I shifting by 60 lines (one hour) for line 1 and then grabbing another 480 lines and create another array. I'd like to continue this for the whole data set. Here's what I'm currently running: n_window = n[:480,:] n_window1 = n[540:1020,:] n_window2 = n[1080:1560,:] n_window3 = n[1620:2100,:] n_window4 = n[2160:2640,:] n_window5 = n[2700:3180,:] n_window6 = n[3240:3720,:] n_window7 = n[3780:4260,:] n_window8 = n[4320:4800,:] n_window9 = n[4860:5340,:] n_window11 = n[5400:5880,:] n_window12 = n[5940:6420,:] n_window13 = n[6480:6960,:] n_window14 = n[7020:7500,:] n_window15 = n[7560:8040,:] n_window16 = n[8100:8580,:] n_window17 = n[8640:9120,:] new_window = np.concatenate([n_window,n_window1, n_window2,n_window3,n_window4,n_window5,n_window6,n_window7,n_window8,n_window9,n_window11,n_window12,n_window13,n_window14,n_window15,n_window16,n_window17]) Can anyone help? RE: Time series - scidam - May-24-2018 from itertools import count import numpy as np def get_subarrays(x): # provide additional parameters, e.g. start, step etc. for s in (x[j:i, :] for j, i in zip(count(0, 540), count(480, 540))): if s.size: yield s else: raise StopIteration x = np.random.rand(10000, 10) data = np.concatenate(list(get_subarrays(x))) print(data.shape)To get most efficient solution, play around numpy.lib.stride_tricks.as_strided .
RE: Time series - honda_933 - May-25-2018 Thank you Scidam for your quick response and I apologize for my slow reply. Can you give me a break down as I'm not familiar this? Thank you in advance, Mark RE: Time series - scidam - May-25-2018 I hope, I understood you correctly. So, I provide some comments for the code above: from itertools import count # count returns infinite generator, e.g. count(10, 5) creates generator starting at 10 with step 5: 10, #15, 20, ... this sequence never ends # So, if you execute # for j in count(10, 5): # this is infinite loop # print(j) # that execution will never been stopped, until, e.g., Ctrl+C. import numpy as np # (x[j:i, :] for j, i in zip(count(0, 540), count(480, 540))) is a generator: # it produces x[0:480, :], x[540:480+540, :], x[1080:1080+540, :] etc. # when the index becomes greater than the size of x, this generator returns empty numpy array, # This generator will never stop. To stop this generator we use for-loop and check size of returned # subarray (s) on each iteration. If returned subarray becomes empty we raise the StopIteration exception. # get_subarrays is a generator, that extracts subarrays from source array x def get_subarrays(x): for s in (x[j:i, :] for j, i in zip(count(0, 540), count(480, 540))): if s.size: yield s else: raise StopIteration x = np.random.rand(10000, 10) # Test array # We need to pass a list of arrays to be concatenated, so, lets create such list. #list(get_subarrays(x)) is equivalent for #res = [] #for item in get_subarrays(x): # This loop is break, when StopIteration is raised (this is common behavior for Python generators/iterators and loops) # res.append(item) # Now, we can pass `res` to np.concatenate, or, for short, use list(get_subarrays(x)) instead of `res`. data = np.concatenate(list(get_subarrays(x))) print(data.shape) RE: Time series - honda_933 - May-25-2018 Thank you, that helps a lot. You're definitely understanding what I'm trying to do, I'm now understanding more as well. Question about variable data, is the first sub array [0:480, :]? I guess I should ask if the step generator skips all information between 0 and 540? RE: Time series - scidam - May-28-2018 for j, i, k in zip(count(0, 540), count(480, 540), range(10)): print(k, j, ':', i)
RE: Time series - honda_933 - May-30-2018 Thank you Scidam for all your help. My code is functioning exactly as I want it to do. Mark |