Python Forum
What data structure I need
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
What data structure I need
#1
Hi all,
I am new to python and I have learned using data frames. I have imported some data series and I was able to set the index to be time based. That makes me picking time durations easier. A few questions though more

a. How I can pick based on time stamps durations?(assume a start point in time X) Pick all rows that are within 1 second.

b. I want then to pick measurements based on a rolling window(previous start point+ a displacement).
I can do this easily in python with a for loop and having indexes increasing gradually. But is this the most efficient way to do that?

c. Now since I am splitting my initial data set to a X new datasets what will be the most efficient data structure to keep that new vectors? So for a long vector that is time indexed I will split it to Y new vectors (they will be overlapping as I described at point b). How I can store them? In other words can my function still return a data frame that makes calculations way easier?


I would like to thank you in advance for your reply.
Regards
Alex
Reply
#2
I am not sure I can completely understand all your questions, but since you are talking about data frames, I suspect that you are using Pandas package.

Pandas supports rolling windows. It could be applied to DataFrame and Series instances. Why do you need to pick windowed data up?
As far as I undertood you correctly, you are trying to do something like this (but with data frames):

Quote:[1, 2, 3, 4, 5] => applying rolling window of size 3 => [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

Note, you usually don't need to store right side of rolling operation. Usually, rolling window is applyied to
obtain, e.g., moving average values. In pandas this could be easily done using .rolling method: df.rolling(3).mean()
df.rolling(3).sum(), or if df has time-based index, df.rolling('3s').mean().
Reply
#3
Thanks for the reply. I have googled it and found the rolling window although taking the .mean(), .sum() or .min() it is not what I want.
I would like to take the data set back from each rolling window. This will be representing data sets sampling that started at a random starting point.

What I also did was to write the code myself that select indexes and gets the data. It was not so easy since I am quite new but I managed to do it. It works with data periods as well.

I share the code as well and I hope it might help someone in the future.

What I miss now is how the parts of the data sets that I pick to save on a column of a new data frame and also name that column with a sequence number.

I highlight in bold below in my code the part where the data needs to be a new column in a data set. Can you help me with that?

Thanks
Alex

Code:
# There are two variable important. WindowsDuration (Duration of collected measurements
# and step, how many milliseconds the next window should shift. The step variable allow the picked 
# datasets to overlap

# set the duration of the window. For example 15 seconds to pick measurements
# Unit should be set in milliseconds
windowDuration=500

# Unit should be in milliseconds
# Next Data Collected will be starting 100ms later than the previous one
step=3000

#Initial values
# start is the time start of the specific window
# end where window ends
# these two values will be shifting by step
start=data_test.index[0]
end=data_test.index[0]+pd.Timedelta(milliseconds=windowDuration)
while start<data_test.index[-1]:
    date_mask=(data_test.index > start) & (data_test.index < end)
    dates = data_test.index[date_mask]
    print(data_test.loc[dates]) # This line picks the data.I want to store that in a new separate column.  Also name the column

    print(start)
    print(end)
    start=start+pd.Timedelta(milliseconds=step)
    end=end+pd.Timedelta(milliseconds=step)
Reply
#4
data_test is data frame. data_test.loc[dates] is data frame too. So, you need to insert array of data frames into column (moreover, of the original data frame). This will consume a lot of memory,
at least. You can store obtained data frames in another list. Another way is to save only indices, e.g.
data_test.loc[dates].index.tolist(). Or, may be, I misunderstood something...Try this:


# somewhere before while loop:
data_test_ = data_test.copy()

# in the while loop, instead of `print(data_test...)`
data_test_.loc[end, 'NEW_COLUMN'] = data_test.loc[dates].index.tolist()
# or even
data_test_.loc[end, 'NEW_COLUMN'] = data_test.loc[dates]

# Finally, outside while loop:
data_test = data_test_
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Data structure question standenman 1 617 Jun-04-2023, 11:51 AM
Last Post: jefsummers
  Is there a better data structure than classes for a set of employes? Schlangenversteher 5 2,580 Feb-26-2020, 11:43 AM
Last Post: buran
  Data saving structure JosefFilosopio 0 2,099 May-04-2019, 10:48 AM
Last Post: JosefFilosopio
  Replacing values for specific columns in Panda data structure Padowan 1 14,635 Nov-27-2017, 08:21 PM
Last Post: Padowan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020