Python Forum
Elegant way to apply each element of an array to a dataframe?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Elegant way to apply each element of an array to a dataframe?
#1
So Dean was previously kind enough to tell me about vector operations on dataframes instead of iterating through each row using loops and has made my code so much better...

So now I have a situation where I have a dataframe with about 930,000 rows. I need to reduce that dataframe by kicking out select rows, which I plan to do via a boolean indexing.

However, I also have an array with n values inside of it (my test case has 5 values in array, but I need the code to with with n values in array).

For each value in my array, I need to run a function that will output start and stop row numbers of rows that I want to keep in my dataframe.

So the way I know how to do this is to use a loop to run through each value in my array, and based on that get all my starting and ending rows for each time the loop runs through and then use boolean indexing to reduce the dataframe.

However, I was wondering, is there a more elegant way to do this? Or do I just need to loop it? Thanks!
Reply
#2
You should provide a short example of the dataframe.
Reply
#3
Here is a printout of the dataframe. This is stock tick data, 5 days worth of data in a single dataframe. This includes the full extended market hours, which are from 4 AM - 8 PM every day. I need to reduce this data to only the data during normal market hours each day, which is 9:30 AM - 4 PM. At the top of the screenshot you see my dates array - I already wrote the code that parses the entire dataset, determines all the unique dates (in this case it's 5 days worth of data but could be any n days worth of data), and outputs an array of all the dates. ts_event are nanosecond unix timestamps.

So my idea was to write the code that determines the relevant timestamps (9:30 AM and 4 PM for each day) and then use boolean indexing to extract the rows for each day and append them into a new dataframe. I was planning on using a loop to cycle through each day. But is there a more elegant way to do it?

[Image: dataframe-example.png]
Reply
#4
What result do you want?
Reply
#5
So as you see the original dataframe has 5 days worth of data, 3-11-2024 through 3-15-2024, and each day ranges from 04:00 to 20:00 - I need to be able to reduce this by extracting only certain time ranges for EACH day - for example - 9:30 through 16:00 for each day - and recombine into a single dataframe. So I know that I can do this by using a for loop for the start/end times for each day, then use boolean indexing to extract the rows for each day then reassemble.

timestarter = 34200000000000 #9h 30m in nanoseconds
time_ender = 57600000000000 #16h 0m in nanoseconds

for value in dates_array:
        datetime_object = datetime.strptime(value, day_format)
        day_unix = datetime_object.timestamp() * 1e9
        daystarter = day_unix + timestarter
        dayender = day_unix + time_ender
        finaldf = finaldf._append(df[(df['ts_event'] >= daystarter) & (df['ts_event'] < dayender)], ignore_index = True)
Example of dates_array:
['03-11-2024' '03-12-2024' '03-13-2024' '03-14-2024' '03-15-2024']

In this code daystarter and dayender are my unix timestamps of the start/end range during each day that I want to keep. Dates array is an array of my dates in MM-DD-YYYY format. df is my original dataframe. finaldf is my new, reduced dataframe. You see how I loop through each individual day here.

I'm just wondering is there a better way to do this than using a loop to go through each day. When I have 5 days of data like in this example, it's not a big deal. But when I'm doing 1000 days of data in one go... it may be onerous unless it's the only way.
Reply
#6
This gives you the first and last time for each day. Does it solve your problem?
iimport pandas as pd
from datetime import datetime, timedelta
from time import time

now = datetime.now()
df = pd.DataFrame({"time": [now + timedelta(seconds=x) for x in range(930000)]})
start = time()
df["day"] = df.time.dt.day
df2 = df[df.day.shift(1) != df.day.shift(-1)]
print(time() - start)
print(df2)
Output:
0.0659632682800293 time day 0 2024-03-28 16:26:10.637194 28 27229 2024-03-28 23:59:59.637194 28 27230 2024-03-29 00:00:00.637194 29 113629 2024-03-29 23:59:59.637194 29 113630 2024-03-30 00:00:00.637194 30 200029 2024-03-30 23:59:59.637194 30 200030 2024-03-31 00:00:00.637194 31 286429 2024-03-31 23:59:59.637194 31 286430 2024-04-01 00:00:00.637194 1 372829 2024-04-01 23:59:59.637194 1 372830 2024-04-02 00:00:00.637194 2 459229 2024-04-02 23:59:59.637194 2 459230 2024-04-03 00:00:00.637194 3 545629 2024-04-03 23:59:59.637194 3
Another approach is to extract the day as above, then group the dataframe by day. You could compute the high, low, nean, open, close for each day.
Reply
#7
Yes thanks that did help, I didn't know about timedelta but that makes things easier for me.
Reply
#8
You wouldn't need to use timedelta. I used timedelta to make some fake data to demonstrate the filter. Your data already has a datetime object
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Is there a more elegant way to concatenate data frames? db042190 3 932 Jun-13-2023, 05:08 PM
Last Post: snippsat
  Change a numpy array to a dataframe Led_Zeppelin 3 1,128 Jan-26-2023, 09:01 PM
Last Post: deanhystad
  Seeing al the data in a dataframe or numpy.array Led_Zeppelin 1 1,155 Jul-11-2022, 08:54 PM
Last Post: Larz60+
  Apply fillna to multiple columns in dataframe rraillon 2 2,454 Aug-05-2021, 01:11 PM
Last Post: rraillon
  acess particular element in dataframe using .loc operator. shantanu97 0 1,430 Jun-30-2021, 03:59 AM
Last Post: shantanu97
  More elegant way to remove time from text lines. Pedroski55 6 3,959 Apr-25-2021, 03:18 PM
Last Post: perfringo
  IF statement to apply at each date illmattic 2 2,682 Apr-08-2021, 12:31 PM
Last Post: illmattic
  Compare each element of an array in a logic statement without using a for loop leocsmith 3 5,884 Apr-01-2021, 07:57 PM
Last Post: deanhystad
  How to apply a class method to an entire dataframe column tirtha9 1 5,158 Jan-03-2021, 04:44 AM
Last Post: klllmmm
  Choose an element from multidimensional array quest_ 2 2,649 Nov-25-2020, 12:59 AM
Last Post: quest_

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020