Python Forum
Unexpected Output - Python Dataframes: Filtering based on Overlapping Dates
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unexpected Output - Python Dataframes: Filtering based on Overlapping Dates
#1
I have 2 dataframes, Requests and Services. Below is an example of the data they contain (apologies, not sure how to make tables look nice Cry ):

**Request Event Dataframe**

| Client_ID | Event_ID | Event Start Date | Event End Date |
| --------- | -------- | ---------------- | -------------- |
| 1 | 100 | 01/04/2023 | 04/04/2023 |
| 1 | 101 | 07/06/2023 | 07/06/2023 |
| 2 | 102 | 01/04/2023 | 04/04/2023 |
| 3 | 103 | 01/05/2023 | 10/05/2023 |

**Services Event Dataframe**

| Client_ID | Event_ID | Event Start Date | Event End Date |
| --------- | -------- | ---------------- | -------------- |
| 1 | 1000 | 01/02/2022 | 04/03/2023 |
| 1 | 1001 | 10/04/2023 | |
| 3 | 1002 | 01/04/2023 | 01/05/2023 |
| 3 | 1003 | 02/05/2023 | 10/07/2023 |

Please note that all requests will have end dates, but some services will not, i.e. they are ongoing. Not all clients in the Requests dataframe will have a service.

I need to determine the following - for each request event for a specific client, did they have any service which was active during the request period. If they did then that request needs to go into a "currently open" dataframe. If they did not then that request needs to go into a "new request" dataframe. Based on the above data i would want to see the following result dataframes:

**Currently Open Dataframe**

| Client_ID | Event_ID | Event Start Date | Event End Date |
| --------- | -------- | ---------------- | -------------- |
| 1 | 101 | 07/06/2023 | 07/06/2023 |
| 3 | 103 | 01/05/2023 | 10/05/2023 |

**New Request Dataframe**

| Client_ID | Event_ID | Event Start Date | Event End Date |
| --------- | -------- | ---------------- | -------------- |
| 1 | 100 | 01/04/2023 | 04/04/2023 |
| 2 | 102 | 01/04/2023 | 04/04/2023 |


I am pretty new to python coding so I have been using ChatGPT to give me a leg-up. It came up with some code that I've been tweaking to no avail so far (please note LTS is my name for the services dataframe):

# Intialise an empty dataframe to store results

new_requests = pd.DataFrame()

# Iterate over each row in the requests dataframe
for index, request_row in requests.iterrows():
    
    # Filter relevant rows in the LTS dataframe for the current client
    relevant_lts_rows = lts[lts['Client ID'] == request_row['Client ID']]
    
    # Check for overlapping events
    overlapping_events = relevant_lts_rows[
        (relevant_lts_rows['Event Start Date'] <= request_row['Event End Date']) & 
        ((relevant_lts_rows['Event End Date'] > request_row['Event Start Date']) | pd.isnull(relevant_lts_rows['Event End Date']))]
    
    # If no overlapping events, add current request to the new_requests dataframe
    if overlapping_events.empty:
        new_requests = pd.concat([new_requests, pd.DataFrame(request_row).transpose()])

# Reset index for new_requests dataframe
new_requests.reset_index(drop=True, inplace=True)

print(new_requests)
Unfortunately it does not seem to extract the correct requests into the new_requests dataframe. Any suggestions would be most appreciated!
Reply
#2
The event ID's in request don't match the event ID's in service. Should they? If they don't how do you know what service goes with which request?

Quote:apologies, not sure how to make tables look nice
Use the output tags. It maintains spacing, so you can print your dataframe, copy the output, and paste into your post.

Quote:I have been using ChatGPT to give me a leg-up
It is more likely slowing you down. There is a lot of crappy code on the internet that ChatGPT is using to learn python.
Reply
#3
Quote:The event ID's in request don't match the event ID's in service. Should they? If they don't how do you know what service goes with which request?
The event IDs are not meant to match. Each request for a client needs to be evaluated against all the services for that client to see if any service overlaps with the request. If any service for that client overlaps with the request, that request need to go into the "currently open" dataframe. If no services overlap, the request needs to go into the "new request" dataframe.

Thanks for the headsup about the output tags :) Due to the nature of the data i have modified the outputs to both simplify and pseudonymise.

Requests:
Output:
Client ID Event Reference Event Start Date Event End Date 0 100 100 01/04/2023 04/04/2023 1 100 101 07/06/2023 07/06/2023 2 101 102 01/04/2023 04/04/2023 3 102 103 01/05/2023 10/05/2023
Services:
Output:
Client ID Event Reference Event Start Date Event End Date 0 100 1000 01/02/2022 04/03/2023 1 100 1001 10/04/2023 NaN 2 102 1002 01/04/2023 01/05/2023 3 102 1003 02/05/2023 10/07/2023
Reply
#4
Your dates are probably all wrong. Assuming Month/Day/Year you cannot compare "07/06/2022" and "01/04/2023".
print("07/06/2022" > "01/04/2023")
Output:
True
You need to convert your date strings to something you can compare (Year/Month/Day) or convert the date strings to datetime objects.

This conveertes the date columns to datetime objects
from io import StringIO
import pandas as pd

requests = StringIO("""Client ID,Event Reference,Event Start Date,Event End Date
100,100,01/04/2023,04/04/2023
100,101,07/06/2023,07/06/2023
101,102,01/04/2023,04/04/2023
102,103,01/05/2023,10/05/2023""")

lts = StringIO("""Client ID,Event Reference,Event Start Date,Event End Date
100,1000,01/02/2022,04/03/2023
100,1001,10/04/2023,
102,1002,01/04/2023,01/05/2023
102,1003,02/05/2023,10/07/2023""")

date_columns = ["Event Start Date", "Event End Date"]
date_format = "%m/%d/%Y"
requests = pd.read_csv(requests, parse_dates=date_columns, date_format=date_format)
lts = pd.read_csv(lts, parse_dates=date_columns, date_format=date_format)
new_requests = pd.DataFrame()

# Iterate over each row in the requests dataframe
for index, request_row in requests.iterrows():
    # Filter relevant rows in the LTS dataframe for the current client
    relevant_lts_rows = lts[lts['Client ID'] == request_row['Client ID']]
    # Check for overlapping events
    overlapping_events = relevant_lts_rows[
        (relevant_lts_rows['Event Start Date'] <= request_row['Event End Date']) & 
        ((relevant_lts_rows['Event End Date'] > request_row['Event Start Date']) | pd.isnull(relevant_lts_rows['Event End Date']))]

    # If no overlapping events, add current request to the new_requests dataframe
    if overlapping_events.empty:
        new_requests = pd.concat([new_requests, pd.DataFrame(request_row).transpose()])
 
# Reset index for new_requests dataframe
new_requests.reset_index(drop=True, inplace=True)
print(new_requests)
Output:
Client ID Event Reference Event Start Date Event End Date 0 100 101 2023-07-06 00:00:00 2023-07-06 00:00:00 1 101 102 2023-01-04 00:00:00 2023-04-04 00:00:00
Reply
#5
Aha, thank you! Dance That makes sense based on what I was seeing, I had assumed that python would know automatically see it as a date - I wont make that assumption any more :)
Reply
#6
That is what happens when you try to use something without first learning about how it works. If you read the documentation for pandas.read_csv() you would wonder why there are so many parameters that deal with reading dates and time. The only time you want a datetime object is when you are doing math or comparisons, and that is pretty rare. Otherwise, you usually want dates and times to remain strings.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Unexpected termination of program when using JDBC drivers in python with jaydebeapi skarface19 2 341 Feb-17-2024, 12:01 PM
Last Post: skarface19
  Unexpected output Starter 2 493 Nov-22-2023, 12:08 AM
Last Post: Starter
  Unexpected output while using random.randint with def terickson2367 1 519 Oct-24-2023, 05:56 AM
Last Post: buran
  Organizing several similar classes with overlapping variables 6hearts 7 1,413 May-07-2023, 02:00 PM
Last Post: 6hearts
  Unexpected output from df.loc when indexing by label idratherbecoding 6 1,208 Apr-19-2023, 12:11 AM
Last Post: deanhystad
  Find overlapping date in database Hilal 2 1,714 Dec-18-2021, 08:15 PM
Last Post: Hilal
  unexpected output asyrafcc99 0 1,508 Oct-24-2020, 02:40 PM
Last Post: asyrafcc99
  How to merge three DataFrames based on specific column Mekala 0 1,808 Sep-08-2020, 02:01 PM
Last Post: Mekala
  Unexpected output: symbols for derivative not being displayed saucerdesigner 0 2,061 Jun-22-2020, 10:06 PM
Last Post: saucerdesigner
  Filtering Excel Document Data Based On Numerical Values eddywinch82 30 10,798 Feb-25-2020, 06:08 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020