Python Forum
Efficient way to mark entries in df with overlap in time ranges
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Efficient way to mark entries in df with overlap in time ranges
#1
Hi, I have df like:
df = pd.DataFrame(np.array([[1, 10, 20], [1, 5, 8], [1, 5, 15], [1, 13, 14], [1, 18, 21],[2, 2, 2], [1, 21, 100], [1, 1, 50]]),
                   columns=['id', 'start', 'stop'])
df['valid'] = True

print(f"{df} \n")
>>> id  start stop valid
  0 1   10    20   True
  1 1   5     8    True
  2 1   5     15   True
  3 1   13    14   True
  4 1   18    21   True
  5 2   2     2    True
  6 1   21    100  True
  7 1   1     50   True
To mark invalid enries, which have the same id and an overlap in time I came up with the idea:
col2 = 'id'
col3 = 'start'
col4 = 'stop'

counter = 0

for index, row in df.iterrows():
    id = df.at[index, col2]
    start = df.at[index, col3]
    stop = df.at[index, col4]

    #3 cases for overlapping time
    #case 1: index.start <= stop <= index.stop
    #case 2: index.start <= start <= index.stop
    #case 3: start <= index start and stop >= index.stop
    df_temp = df.query(f"{col2} == '{id}' and (({start} <= {col4} <= {stop}) or ({start} <= {col3} <= {stop}) or ({col3} <= {start} and {col4} >= {stop}))")
    #sort out same index
    df_temp = df_temp.drop([index])

    for index in df_temp.index:
        if df.loc[index, 'valid'] == True:
            df.loc[index, 'valid'] = False
            counter += 1
print(f"Affected Rows: {counter}")
My solution takes aprox 6s for 0.1% (326 rows), which results with linear interpolation in aprox 100min. Is there a way, to make this faster? I'd appreciate a hint. Thanks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Different Correlation Coefficents with different Time Ranges giaco__mar 0 849 Sep-28-2022, 02:03 PM
Last Post: giaco__mar
  Finding overlap of boxed coordinates 83dons 4 1,973 Aug-31-2021, 10:14 PM
Last Post: 83dons
  Architecting Efficient Plot blipton 0 1,310 Jan-03-2021, 07:44 PM
Last Post: blipton
  Help with Optical Mark Recognition kekko 1 1,850 Mar-11-2020, 08:28 PM
Last Post: Larz60+
  material for OOP and efficient numrical programming paul18fr 0 2,066 Sep-11-2019, 08:36 PM
Last Post: paul18fr
  Creating a graph with ranges Chris1986 3 4,092 Apr-24-2017, 12:22 PM
Last Post: Chris1986

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020