Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Time series
#1
Hi I'm trying to find a simple way to run through a large data set (2432094 lines). I'm currently writing a program that takes the first 480 lines (8 hours) and creates a new array. Then I shifting by 60 lines (one hour) for line 1 and then grabbing another 480 lines and create another array. I'd like to continue this for the whole data set.

Here's what I'm currently running:

n_window = n[:480,:]
n_window1 = n[540:1020,:]
n_window2 = n[1080:1560,:]
n_window3 = n[1620:2100,:]
n_window4 = n[2160:2640,:]
n_window5 = n[2700:3180,:]
n_window6 = n[3240:3720,:]
n_window7 = n[3780:4260,:]
n_window8 = n[4320:4800,:]
n_window9 = n[4860:5340,:]
n_window11 = n[5400:5880,:]
n_window12 = n[5940:6420,:]
n_window13 = n[6480:6960,:]
n_window14 = n[7020:7500,:]
n_window15 = n[7560:8040,:]
n_window16 = n[8100:8580,:]
n_window17 = n[8640:9120,:]

new_window = np.concatenate([n_window,n_window1, n_window2,n_window3,n_window4,n_window5,n_window6,n_window7,n_window8,n_window9,n_window11,n_window12,n_window13,n_window14,n_window15,n_window16,n_window17])


Can anyone help?
Reply
#2
from itertools import count
import numpy as np

def get_subarrays(x): # provide additional parameters, e.g. start, step etc.
    for s in (x[j:i, :] for j, i in zip(count(0, 540), count(480, 540))):
        if s.size:
            yield s
        else:
            raise StopIteration

x = np.random.rand(10000, 10)
data = np.concatenate(list(get_subarrays(x)))
print(data.shape)
To get most efficient solution, play around numpy.lib.stride_tricks.as_strided.
Reply
#3
Thank you Scidam for your quick response and I apologize for my slow reply. Can you give me a break down as I'm not familiar this?

Thank you in advance,

Mark
Reply
#4
I hope, I understood you correctly. So, I provide some comments for the code above:

from itertools import count
# count returns infinite generator, e.g. count(10, 5) creates generator starting at 10 with step 5: 10, #15, 20, ... this sequence never ends
# So, if you execute
# for j in count(10, 5): # this is infinite loop
#     print(j) 
# that execution will never been stopped, until, e.g.,  Ctrl+C.

import numpy as np

# (x[j:i, :] for j, i in zip(count(0, 540), count(480, 540))) is a generator:
# it produces x[0:480, :], x[540:480+540, :], x[1080:1080+540, :] etc. 
# when the index becomes greater than the size of x, this generator returns empty numpy array, 
# This generator will never stop. To stop this generator we use for-loop and check size of returned
# subarray (s) on each iteration. If returned subarray becomes empty we raise the StopIteration exception.

# get_subarrays is a generator, that extracts subarrays from source array x
def get_subarrays(x):
    for s in (x[j:i, :] for j, i in zip(count(0, 540), count(480, 540))):
        if s.size:
            yield s
        else:
            raise StopIteration
 
x = np.random.rand(10000, 10)  # Test array

# We need to pass a list of arrays to be concatenated, so, lets create such list. 

#list(get_subarrays(x)) is equivalent for 
#res = []
#for item in get_subarrays(x): # This loop is break, when StopIteration is raised (this is common behavior for Python generators/iterators and loops)
#    res.append(item)

# Now, we can pass `res` to np.concatenate, or, for short, use list(get_subarrays(x)) instead of `res`.
data = np.concatenate(list(get_subarrays(x)))

print(data.shape)
Reply
#5
Thank you, that helps a lot. You're definitely understanding what I'm trying to do, I'm now understanding more as well. Question about variable data, is the first sub array [0:480, :]? I guess I should ask if the step generator skips all information between 0 and 540?
Reply
#6
for j, i, k in zip(count(0, 540), count(480, 540), range(10)):
    print(k, j, ':', i)
Output:
0 0 : 480 1 540 : 1020 2 1080 : 1560 3 1620 : 2100 4 2160 : 2640 5 2700 : 3180 6 3240 : 3720 7 3780 : 4260 8 4320 : 4800 9 4860 : 5340
Reply
#7
Thank you Scidam for all your help. My code is functioning exactly as I want it to do.

Mark
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help: Conversion of Electricity Data into Time Series Data SmallGuy 3 1,158 Oct-04-2023, 03:31 PM
Last Post: deanhystad
  Time Series Production Process Problem Mzarour 1 2,098 Feb-28-2023, 12:25 PM
Last Post: get2sid
  reduce time series based on sum condition amdi40 0 1,078 Apr-06-2022, 09:09 AM
Last Post: amdi40
  How to accumulate volume of time series amdi40 3 2,259 Feb-15-2022, 02:23 PM
Last Post: amdi40
  Recommendations for ML libraries for time-series forecast AndreasPython 0 1,862 Jan-06-2021, 01:03 PM
Last Post: AndreasPython
  Time Series forecating with multiple independent variables Krychol88 1 1,823 Oct-23-2020, 08:11 AM
Last Post: DPaul
  how to handling time series data file with Python? aupres 4 2,925 Aug-10-2020, 12:40 PM
Last Post: MattKahn13
  Changing Time Series from Start to End of Month illmattic 0 1,827 Jul-16-2020, 10:49 AM
Last Post: illmattic
  HELP- DATA FRAME INTO TIME SERIES- BASIC bntayfur 0 1,732 Jul-11-2020, 09:04 PM
Last Post: bntayfur
  Differencing Time series and Inverse after Training donnertrud 0 4,077 May-27-2020, 06:11 AM
Last Post: donnertrud

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020