Python Forum
Pyspark Window: perform sum over a window with specific conditions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pyspark Window: perform sum over a window with specific conditions
#1
Hello everyone,

I'm just learning about pyspark windows, and I was wondering how to achieve a particular operation.

I have a Pyspark table that looks like this:

ID | DATE_OPEN | DATE_CLOSE | IS_DATE_WITH_EVENT | explanation
1 | 2022-03-05 | 2022-03-10 | 1 | there was an event created on 03-05 for this ID, and closed on 03-10
1 | 2022-03-06 | null______ | 0 | there was no event created for this ID on 03-06
1 | 2022-03-07 | null______ | 1 | there was an event created on 03-07 for this ID, but it hasn't been closed yet
1 | 2022-03-08 | 2022-03-08 | 1 | ...
1 | 2022-03-09 | null______ | 0 | ...
1 | 2022-03-10 | null______ | 0 | ...
1 | 2022-03-11 | 2022-03-12 | 1 | ...
1 | 2022-03-12 | null______ | 1 | ...
1 | 2022-03-13 | 2022-03-13 | 1 | ...
1 | 2022-03-14 | null______ | 0 | ...
1 | 2022-03-15 | null______ | 0 | ...
2 | ..._______ | ..._______ |...| ...
..| ..._______ | ..._______ |...| ...
2 | ..._______ | ..._______ |...| ...
3 | ..._______ | ..._______ |...| ...
..| ..._______ | ..._______ |...| ...


What I want to do is, for each ID, and each date in DATE_OPEN, count the number of events created in the 3 days previous to that DATE_OPEN, that are still open at that DATE_OPEN (so those that have a DATE_CLOSE that is greater than the DATE_OPEN in question, or a DATE_CLOSE that is "null" while IS_DATE_EVENT==1).

To illustrate, here is the expected result:

ID | DATE_OPEN | DATE_CLOSE | IS_DATE_WITH_EVENT | EXPECTED_RESULT | explanation
1 | 2022-03-05 | 2022-03-10 | 1_________________ | 1 | event created on day D, not closed yet as of 03-05
1 | 2022-03-06 | null______ | 0_________________ | 1 | event created on D-1, not closed yet as of 03-06
1 | 2022-03-07 | null______ | 1_________________ | 2 | event created on D-2, and one on D, both not closed yet as of 03-07
1 | 2022-03-08 | 2022-03-08 | 1_________________ | 2 | event created on D-3, and one on D-1, both not closed yet as of 03-08
1 | 2022-03-09 | null______ | 0_________________ | 1 | event created on D-2, not closed yet as of 03-09
1 | 2022-03-10 | null______ | 0_________________ | 1 | event created on D-3, not closed yet as of 03-10
1 | 2022-03-11 | 2022-03-12 | 1_________________ | 1 | event created on D, not closed yet as of 03-11
1 | 2022-03-12 | null______ | 1_________________ | 1 | event created on D, not closed yet as of 03-12
1 | 2022-03-13 | 2022-03-13 | 1_________________ | 1 | event created on D-1, not closed yet as of 03-13
1 | 2022-03-14 | null______ | 0_________________ | 1 | event created on D-2, not closed yet as of 03-14
1 | 2022-03-15 | null______ | 0_________________ | 1 | event created on D-3, not closed yet as of 03-15
2 | ..._______ | ..._______ | ..._______________ |...| ...
..| ..._______ | ..._______ | ..._______________ |...| ...
2 | ..._______ | ..._______ | ..._______________ |...| ...
3 | ..._______ | ..._______ | ..._______________ |...| ...
..| ..._______ | ..._______ | ..._______________ |...| ...


I have started with defining a window this way:
days = lambda i: i * 86400
id_window = (
        Window()
        .partitionBy(F.col("ID"))
        .orderBy(F.col("DATE_OPEN").cast("timestamp").cast("long"))
        .rangeBetween(-days(3), days(0))
    )
But I don't know if it's the window I need to enrich to solve my problem (and how to do so), or if it's directly when creating the new feature. I'm a bit stuck here ^^

I would really appreciate your help! Thanks for having taken the time to read my post :)

Cheers
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to Minimize ADB window OomKoos 0 345 Dec-29-2023, 12:41 PM
Last Post: OomKoos
  add entries and labels to the window tkinter jacksfrustration 3 550 Oct-10-2023, 06:41 PM
Last Post: buran
  Is there a way to call and focus any popup window outside of the main window app? Valjean 6 1,613 Oct-02-2023, 04:11 PM
Last Post: deanhystad
  Can't stop keyboard listener to grab chars typed inside CTk window Valjean 9 1,254 Sep-25-2023, 08:07 PM
Last Post: deanhystad
  read active document name - other than from the window title ineuw 0 495 Sep-11-2023, 09:06 AM
Last Post: ineuw
  how to open a popup window in tkinter with entry,label and button lunacy90 1 811 Sep-01-2023, 12:07 AM
Last Post: lunacy90
Bug tkinter.TclError: bad window path name "!button" V1ber 2 725 Aug-14-2023, 02:46 PM
Last Post: V1ber
  Howto do motion event on solely window and not the widgets on it? janeik 3 792 Jul-11-2023, 12:10 AM
Last Post: deanhystad
  What is all the info in the info window in Idle? Pedroski55 3 647 Jul-08-2023, 11:26 AM
Last Post: DeaD_EyE
  PySpark Coding Challenge cpatte7372 4 5,966 Jun-25-2023, 12:56 PM
Last Post: prajwal_0078

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020