Jan-20-2023, 12:57 AM
Hello,
While not necessarily homework, it seems the best place to post considering the question and difficulty level. I am trying to analyze some data so first trying to get it into get it into the right format.
I have multiple DataFrames of large shapes (rows vary around 20000-22000 in length and columns of 10-17 in length).
I have one DataFrame (let's call it a "master" one) that represents one location, and I have multiple other ones of similar length when it comes to rows (each DataFrame represents a different location). Initially, I am trying to fill in a DataFrame column that I added to the existing "master" DataFrame depending on its row's time compared to the other row's time (of the other DataFrames). The time stamps don't match up perfectly and the shapes of the other DataFrames are different so my desire is to: fill the master DataFrame's row in the new column with the corresponding data from the other DataFrame (other location). Simply if the data row is within 59 mins and 30 seconds of the master's time stamp.
Unfortunately, I don't think I'm doing it in a python-efficient way as the script has been executing for over three hours. Here's some example code. Is there a better way to compare time stamps and not rely as heavily on for loops?
While not necessarily homework, it seems the best place to post considering the question and difficulty level. I am trying to analyze some data so first trying to get it into get it into the right format.
I have multiple DataFrames of large shapes (rows vary around 20000-22000 in length and columns of 10-17 in length).
I have one DataFrame (let's call it a "master" one) that represents one location, and I have multiple other ones of similar length when it comes to rows (each DataFrame represents a different location). Initially, I am trying to fill in a DataFrame column that I added to the existing "master" DataFrame depending on its row's time compared to the other row's time (of the other DataFrames). The time stamps don't match up perfectly and the shapes of the other DataFrames are different so my desire is to: fill the master DataFrame's row in the new column with the corresponding data from the other DataFrame (other location). Simply if the data row is within 59 mins and 30 seconds of the master's time stamp.
Unfortunately, I don't think I'm doing it in a python-efficient way as the script has been executing for over three hours. Here's some example code. Is there a better way to compare time stamps and not rely as heavily on for loops?
# iterate through each row for ind in df1.index: # See if it has a wind gust by checking wind gust and see if it is higher than 0 mph if df1['wind_gust'][ind]>=0: # Check now if there is a corresponding neighbor obs within 59 minutes, if so plug it in # Check site df2 first for ind_df2 in df2.index: # Compare the time stamp of df1 and df2. If df2 falls within +/- 29 mins and 45 seconds then copy it in! if ((df2['Date_Time'][ind_df2]) <= (dfOD184['Date_Time'][ind] + pd.Timedelta("29 min 45 us"))) & \ ((df2['Date_Time'][ind_df2]) >= (dfOD184['Date_Time'][ind] - pd.Timedelta("29 min 45 us"))): # If df2 obs is within 59 minutes and 30 seconds, insert the obs df1['df2 pressure'][ind] = df2['pressure'][ind_df2] # Stop the current iteration of the df2 loop once filled and move to check df3 next by using continue continue # Check site df3 now for ind_df3 in df3.index: if ((df3['Date_Time'][ind_df3 ]) <= (df1['Date_Time'][ind] + pd.Timedelta("29 min 45 us"))) & \ ((df3['Date_Time'][ind_df3 ]) >= (df1['Date_Time'][ind] - pd.Timedelta("29 min 45 us"))): # If df3 obs is within 59 minutes and 30 seconds, insert the pressure obs df1['df3 pressure'][ind] = df3['sea_level_pressure_set_1'][ind_df3] continue