Nov-21-2021, 12:41 AM
(This post was last modified: Nov-21-2021, 12:41 AM by shantanu97.)
Check the Problem-2.jpg file.
The files will (should) always be the same from columns A to H. The second last column will be S3Link. The columns between H and the S3Link column change between some files depending on the data is being reported.
For the final CSV, we need to take columns A to H, the S3Link column and the DataSource as below. Columns between column H and the S3Link column need to be normalised into the AttributeName and AttributeValue columns as shown below. It needs to be dynamic. In these files, the number of dynamic columns can vary from 1 up to 5 or 6, maybe.
The problem is that I am not able to dynamic loop and have difficulty normalising stuff into the Attribute Value and Attribute name. Till now, I am able to write python code:
1. The First Loop is Looping over CSV file
2. The Second Loop is for reaching the end of the Excel loop.
3. The third loop is for traversing columns - I think this is wrong.
Input File attached.
I don't know my logic for this task is correct or wrong. Any help????
The files will (should) always be the same from columns A to H. The second last column will be S3Link. The columns between H and the S3Link column change between some files depending on the data is being reported.
For the final CSV, we need to take columns A to H, the S3Link column and the DataSource as below. Columns between column H and the S3Link column need to be normalised into the AttributeName and AttributeValue columns as shown below. It needs to be dynamic. In these files, the number of dynamic columns can vary from 1 up to 5 or 6, maybe.
The problem is that I am not able to dynamic loop and have difficulty normalising stuff into the Attribute Value and Attribute name. Till now, I am able to write python code:
1. The First Loop is Looping over CSV file
2. The Second Loop is for reaching the end of the Excel loop.
3. The third loop is for traversing columns - I think this is wrong.
for fn in csv_files: all_dfs = pd.read_csv(fn) for i in range(1,len(df)): # Row Loop frameNo = all_dfs.iloc[i,1] gpsTimestamp = all_dfs.iloc[i,2] videoTimeInSec = all_dfs.iloc[i,3] latitude = all_dfs.iloc[i,4] longitude = all_dfs.iloc[i,5] metreID = all_dfs.iloc[i,6] Distance = all_dfs.iloc[i,7] vineNumber = all_dfs.iloc[i,8] for j in range(1,len(df.loc[0,"vineNumber":"S3Link")): #Column Loop For Normalization AttributeName = all_dfs.iloc[i,9] AttributeValue = all_dfs.iloc[i,10] S3Link = all_dfs.loc[i,S3Link] DataSource = all_dfs.loc[i,DataSource] rs = rs.append(pd.DataFrame({ "frameNo" = [frameNo] "gpsTimestamp" = [gpsTimestamp] "videoTimeInSec" = [videoTimeInSec] "latitude" = [latitude] "longitude" = [longitude] "metreID" = [metreID] "Distance" = [Distance] "vineNumber" = [vineNumber] "AttributeName" = [AttributeName] "AttributeValue" = [AttributeValue] "S3Link" = [S3Link] "DataSource" = [DataSource] }),ignore_index=True)
Input File attached.
I don't know my logic for this task is correct or wrong. Any help????
Attached Files
Thumbnail(s)

