Python Forum
Manipulating panda dataframes more python-like
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Manipulating panda dataframes more python-like
#4
I think I understand your question a bit better now.

df1 - Has most of the information you want.
df2 - Has some additional information you would like to add to df1.
Only some rows in df1 should be updated.
df1 and df1 both have a datetime column that can be used to synchronize the two.

How about something like this? It takes half a second to process a 20,000 row dataframe.
import pandas as pd
import numpy as np
from time import time


# Make some dummy dataframes
# df1 has datetime, windspeed and pressure.  Pressures are all 0/empty
# df2 has datetime and pressure
df1 = pd.DataFrame({
    "Datetime": pd.date_range("1/1/2022", "1/1/2023", 20000),
    "WindSpeed": np.random.randint(2, size=20000),
    "Pressure": np.zeros(20000)
})
df2 = pd.DataFrame({
    "Datetime": pd.date_range("1/1/2022", "1/1/2023", 15000),
    "Pressure": np.random.random(15000) * 5 + 1,
})

starting_time = time()

# We are only interested in wind events.  No reason to look at other rows.
# We can roll changes back into df1 later on using the row index values.
wind_events = df1[df1["WindSpeed"] > 0]

# Extract the time and pressure info.  Faster to work with these
# as lists instead of getting from dataframes each time.
a_times = wind_events["Datetime"].values
b_times = df2["Datetime"].values
a_pressure = wind_events.columns.get_loc("Pressure")
b_pressure = df2["Pressure"].values

# For each wind event, find the pressure from df2 that has the
# closest timestamp.  Only accept pressues within 10 minutes of
# the wind event.
window = np.timedelta64(10, "m")
b = 0
for a, a_time in enumerate(a_times):
    best_time = window + 1
    best_index = None
    for b in range(b, len(b_times)):
        diff = b_times[b] - a_time
        if diff > window:
            break
        diff = abs(diff)
        if diff < best_time:
            best_time = diff
            best_index = b
    if best_index is not None and best_time <= window:
        wind_events.iat[a, a_pressure] = b_pressure[best_index]
df1.update(wind_events["Pressure"])
print(time() - starting_time)
Output:
0.54099893569946293
Reply


Messages In This Thread
RE: Manipulating panda dataframes more python-like - by deanhystad - Jan-28-2023, 06:58 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Panda Exercise nyzs92 0 1,510 Sep-24-2021, 03:35 AM
Last Post: nyzs92
  Manipulating code to draw a tree Py_thon 8 3,263 Nov-21-2019, 05:00 PM
Last Post: sumana
  Manipulating List frenchyinspace 2 2,702 Oct-08-2019, 07:57 AM
Last Post: perfringo
  Manipulating __init__ method schniefen 5 3,539 May-06-2019, 11:22 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020