Python Forum
Multiprocessing on python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Multiprocessing on python
#11
Well now I've discovered rolling on polars dataframes - could I use this to eliminate my while loop altogether?

df = pl.read_csv(sys.argv[1])
print(df)
df = df.with_columns(pl.col("ts_event").cast(pl.Datetime))
print(df)

# Define your rolling window in time
window_duration = "35s"  # 35 seconds window
every_duration = "1s"    # Shift the window every 1 second

out = df.rolling(index_column = 'ts_event', period = '30s', offset = '1s').agg(pl.col('size').sum())

print(out)
So as practice, ts_event in my CSV file is nanosecond unix timestamps. I first convert this column to datetime objects. The column 'size' is just a column of integers, say typically ranging anywhere from between 1-500 in value.

I want to take the first 30s worth of rows, and find the sum of the size col in that 35s. Then I want to increment up by one second and find the next sum.

For example, let's say my data starts at 08:00:00 -

So first I want to find the sum of the size col beween

8:00:00-8:00:30
then
8:00:01-8:00:31
8:00:02-8:00:32

and so on...

And for now we can just deposit these results into another dataframe.

So with the above code I posted I get the error

Error:
- If your data is ALREADY sorted, set the sorted flag with: '.set_sorted()'.
My ts_event col is already sorted chronologically. But I can't figure out the syntax for how to set that flag in my code - no matter where I try it, it keeps throwing me another error.

How do I set the '.set_sorted()' flag?
Reply
#12
Post entire error message, including the traceback.
Reply
#13
Error:
shape: (614_076, 5) ┌──────────────────────────────┬───────────────────────────────┬──────┬────────┬──────┐ │ ts_event ┆ eastern_time ┆ side ┆ price ┆ size │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ str ┆ str ┆ f64 ┆ i64 │ ╞══════════════════════════════╪═══════════════════════════════╪══════╪════════╪══════╡ │ +56135-10-13 20:00:02.663539 ┆ 03-01-2024 09:30:00.002663424 ┆ N ┆ 198.05 ┆ 34 │ │ +56135-10-13 20:00:02.663539 ┆ 03-01-2024 09:30:00.002663424 ┆ N ┆ 198.05 ┆ 4 │ │ +56135-10-13 20:00:03.314087 ┆ 03-01-2024 09:30:00.003314176 ┆ N ┆ 198.05 ┆ 46 │ │ +56135-10-13 20:00:03.314087 ┆ 03-01-2024 09:30:00.003314176 ┆ N ┆ 198.05 ┆ 34 │ │ +56135-10-13 20:00:03.314087 ┆ 03-01-2024 09:30:00.003314176 ┆ N ┆ 198.06 ┆ 5 │ │ … ┆ … ┆ … ┆ … ┆ … │ │ +56209-08-25 23:57:44.164197 ┆ 03-28-2024 09:59:59.864164096 ┆ N ┆ 180.47 ┆ 5 │ │ +56209-08-25 23:57:44.164197 ┆ 03-28-2024 09:59:59.864164096 ┆ B ┆ 180.48 ┆ 95 │ │ +56209-08-25 23:57:44.210570 ┆ 03-28-2024 09:59:59.864210688 ┆ B ┆ 180.48 ┆ 5 │ │ +56209-08-25 23:57:44.341235 ┆ 03-28-2024 09:59:59.864341248 ┆ B ┆ 180.49 ┆ 5 │ │ +56209-08-25 23:57:44.810835 ┆ 03-28-2024 09:59:59.864810752 ┆ N ┆ 180.49 ┆ 7 │ └──────────────────────────────┴───────────────────────────────┴──────┴────────┴──────┘ Traceback (most recent call last): File "C:\Users\thpfs\Documents\Python\volwa.py", line 40, in <module> out = df.rolling(index_column = 'ts_event', period = '35s', offset = '1s').agg(pl.col("size").sum()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\thpfs\AppData\Local\Programs\Python\Python312\Lib\site-packages\polars\dataframe\group_by.py", line 894, in agg .collect(no_optimization=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\thpfs\AppData\Local\Programs\Python\Python312\Lib\site-packages\polars\lazyframe\frame.py", line 1943, in collect return wrap_df(ldf.collect()) ^^^^^^^^^^^^^ polars.exceptions.InvalidOperationError: argument in operation 'rolling' is not explicitly sorted - If your data is ALREADY sorted, set the sorted flag with: '.set_sorted()'. - If your data is NOT sorted, sort the 'expr/series/column' first.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to run existing python script parallel using multiprocessing lravikumarvsp 3 4,811 May-24-2018, 05:23 AM
Last Post: lravikumarvsp

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020