Python Forum
Manipulating panda dataframes more python-like
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Manipulating panda dataframes more python-like
#1
Hello,

While not necessarily homework, it seems the best place to post considering the question and difficulty level. I am trying to analyze some data so first trying to get it into get it into the right format.

I have multiple DataFrames of large shapes (rows vary around 20000-22000 in length and columns of 10-17 in length).

I have one DataFrame (let's call it a "master" one) that represents one location, and I have multiple other ones of similar length when it comes to rows (each DataFrame represents a different location). Initially, I am trying to fill in a DataFrame column that I added to the existing "master" DataFrame depending on its row's time compared to the other row's time (of the other DataFrames). The time stamps don't match up perfectly and the shapes of the other DataFrames are different so my desire is to: fill the master DataFrame's row in the new column with the corresponding data from the other DataFrame (other location). Simply if the data row is within 59 mins and 30 seconds of the master's time stamp.

Unfortunately, I don't think I'm doing it in a python-efficient way as the script has been executing for over three hours. Here's some example code. Is there a better way to compare time stamps and not rely as heavily on for loops?

# iterate through each row
for ind in df1.index:

  # See if it has a wind gust by checking wind gust and see if it is higher than 0 mph
    if df1['wind_gust'][ind]>=0:

      # Check now if there is a corresponding neighbor obs within 59 minutes, if so plug it in
      # Check site df2 first
      for ind_df2 in df2.index:

        # Compare the time stamp of df1 and df2. If df2 falls within +/- 29 mins and 45 seconds then copy it in!
        if ((df2['Date_Time'][ind_df2]) <= (dfOD184['Date_Time'][ind] + pd.Timedelta("29 min 45 us"))) & \
        ((df2['Date_Time'][ind_df2]) >= (dfOD184['Date_Time'][ind] - pd.Timedelta("29 min 45 us"))):
          # If df2 obs is within 59 minutes and 30 seconds, insert the obs
          df1['df2 pressure'][ind] = df2['pressure'][ind_df2]

          # Stop the current iteration of the df2 loop once filled and move to check df3 next by using continue
      continue

      # Check site df3 now
      for ind_df3 in df3.index:
        if ((df3['Date_Time'][ind_df3 ]) <= (df1['Date_Time'][ind] + pd.Timedelta("29 min 45 us"))) & \
        ((df3['Date_Time'][ind_df3 ]) >= (df1['Date_Time'][ind] - pd.Timedelta("29 min 45 us"))):

        # If df3 obs is within 59 minutes and 30 seconds, insert the pressure obs
          df1['df3 pressure'][ind] = df3['sea_level_pressure_set_1'][ind_df3]

      continue
Reply


Messages In This Thread
Manipulating panda dataframes more python-like - by badtwistoffate - Jan-20-2023, 12:57 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Panda Exercise nyzs92 0 1,510 Sep-24-2021, 03:35 AM
Last Post: nyzs92
  Manipulating code to draw a tree Py_thon 8 3,258 Nov-21-2019, 05:00 PM
Last Post: sumana
  Manipulating List frenchyinspace 2 2,699 Oct-08-2019, 07:57 AM
Last Post: perfringo
  Manipulating __init__ method schniefen 5 3,532 May-06-2019, 11:22 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020