iretate over columns in df and calculate euclidean distance with one column in pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: iretate over columns in df and calculate euclidean distance with one column in pandas (/thread-33594.html) |
iretate over columns in df and calculate euclidean distance with one column in pandas - Pit292 - May-09-2021 Hi, I have a dataset with several columns (Time Series) and I would like to synchronize them - the 'col2' should be the reference. Here is my df: With the code below I am able to synchronize the only two columns 'col3' according to 'col2' (time series). ------------- ------------- import pandas as pd import numpy as np # pip install fastdtw df=pd.DataFrame({'ID':range(0,25), 'col2':np.random.randn(25)+3, 'col3':np.random.randn(25)+3,'col4':np.random.randn(25)+3,'col5':np.random.randn(25)+3}) from fastdtw import * from scipy.spatial.distance import * x = np.array(df['col2'].fillna(0)) y = np.array(df['col3'].fillna(0)) distance, path = fastdtw(x, y, dist=euclidean) result = [] for i in range(0,len(path)): result.append([df['ID'].iloc[path[i][0]], df['col2'].iloc[path[i][0]], df['col3'].iloc[path[i][1]]]) df_synchronized = pd.DataFrame(data=result,columns=['ID','col2','col3']).dropna() df_synchronized = df_synchronized.drop_duplicates(subset=['ID']) df_synchronized = df_synchronized.sort_values(by='ID') df_synchronized = df_synchronized.reset_index(drop=True) df_synchronized.head(n=3)------------- ------------- Here is the df_synchronized: I would like to iterate over all columns in DataFrame and do the same for 'col4' and 'col5' as was for 'col3' being done. Simply, 'col3' needs to be replaced in a loop with 'col4' and 'col5'. The goal would be to have the df_synchronized with all columns from df. Is there any way, how to make it done? -------- distance, path = fastdtw(x, y, dist=euclidean)------- can't be change to distance, path = fastdtw(x, y, z, aa, dist=euclidean) . 'Synchronization' needs to be done on one column, then save into df_synchronized, then with next column...
|