Python Forum
Manipulating panda dataframes more python-like
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Manipulating panda dataframes more python-like
#5
Better and faster than before. 20,000 record table in about 0.16 seconds. You can ignore the slice/copy warning.
import pandas as pd
import numpy as np
from time import time


# Make some dummy dataframes
# df1 has datetime, windspeed and pressure.  Pressures are all 0/empty
# df2 has datetime and pressure
df1 = pd.DataFrame({
    "Datetime": pd.date_range("1/1/2022", "1/1/2023", 20000),
    "WindSpeed": np.random.randint(2, size=20000),
    "Pressure": np.zeros(20000)
})
df2 = pd.DataFrame({
    "Datetime": pd.date_range("1/1/2022", "1/1/2023", 15000),
    "Pressure": np.random.random(15000) * 5 + 1,
})

starting_time = time()

# We are only interested in wind events.  No reason to look at other rows.
# We can roll changes back into df1 later on using the row index values.
wind_events = df1[df1["WindSpeed"] > 0]

# Extract the time and pressure info.  Faster to work with these
# as lists instead of getting from dataframes each time.
a_times = wind_events["Datetime"].values
b_times = df2["Datetime"].values
a_pressure = wind_events["Pressure"].values
b_pressure = df2["Pressure"].values

# For each wind event, find the pressure from df2 that has the
# closest timestamp.  Only accept pressues within time window.
window = np.timedelta64(10, "m")
b = 0
for a, a_time in enumerate(a_times):
    best_diff = prev_diff = abs(b_times[b] - a_time)
    best_index = b
    for b in range(b+1, len(b_times)):
        diff = abs(b_times[b] - a_time)
        if diff < best_diff:
            best_diff, best_index = diff, b
        if diff > prev_diff:
            # We are not going to get a better match
            break
        prev_diff = diff
    if best_index is not None and best_diff <= window:
        a_pressure[a] = b_pressure[best_index]

# Replace wind_events["Pressure"] with a_pressure list and fold
# wind_events back into df1.
wind_events["Pressure"] = a_pressure
df1.update(wind_events["Pressure"])
print(time() - starting_time)
Reply


Messages In This Thread
RE: Manipulating panda dataframes more python-like - by deanhystad - Jan-31-2023, 04:30 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Panda Exercise nyzs92 0 1,495 Sep-24-2021, 03:35 AM
Last Post: nyzs92
  Manipulating code to draw a tree Py_thon 8 3,239 Nov-21-2019, 05:00 PM
Last Post: sumana
  Manipulating List frenchyinspace 2 2,661 Oct-08-2019, 07:57 AM
Last Post: perfringo
  Manipulating __init__ method schniefen 5 3,516 May-06-2019, 11:22 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020