Python Forum

Full Version: How to transform from wide to long format in python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Check the Problem-2.jpg file.

The files will (should) always be the same from columns A to H. The second last column will be S3Link. The columns between H and the S3Link column change between some files depending on the data is being reported.

For the final CSV, we need to take columns A to H, the S3Link column and the DataSource as below. Columns between column H and the S3Link column need to be normalised into the AttributeName and AttributeValue columns as shown below. It needs to be dynamic. In these files, the number of dynamic columns can vary from 1 up to 5 or 6, maybe.

The problem is that I am not able to dynamic loop and have difficulty normalising stuff into the Attribute Value and Attribute name. Till now, I am able to write python code:

1. The First Loop is Looping over CSV file
2. The Second Loop is for reaching the end of the Excel loop.
3. The third loop is for traversing columns - I think this is wrong.
for fn in csv_files:
    all_dfs = pd.read_csv(fn)
        for i in range(1,len(df)):  # Row Loop
            frameNo = all_dfs.iloc[i,1]
            gpsTimestamp = all_dfs.iloc[i,2]
            videoTimeInSec = all_dfs.iloc[i,3]
            latitude = all_dfs.iloc[i,4]
            longitude = all_dfs.iloc[i,5]
            metreID = all_dfs.iloc[i,6]
            Distance = all_dfs.iloc[i,7]
            vineNumber = all_dfs.iloc[i,8]            
            for j in range(1,len(df.loc[0,"vineNumber":"S3Link")):  #Column Loop For Normalization
                AttributeName = all_dfs.iloc[i,9]
                AttributeValue = all_dfs.iloc[i,10]
                S3Link = all_dfs.loc[i,S3Link]
                DataSource = all_dfs.loc[i,DataSource]
                rs = rs.append(pd.DataFrame({
                    "frameNo" = [frameNo]
                    "gpsTimestamp" = [gpsTimestamp]
                    "videoTimeInSec" = [videoTimeInSec]
                    "latitude" = [latitude]
                    "longitude" = [longitude]
                    "metreID" = [metreID]
                    "Distance" = [Distance]
                    "vineNumber" =  [vineNumber]
                    "AttributeName" = [AttributeName]
                    "AttributeValue" = [AttributeValue]
                    "S3Link" = [S3Link]
                    "DataSource" = [DataSource]
                }),ignore_index=True
)

Input File attached.

I don't know my logic for this task is correct or wrong. Any help????
I think you are overthinking/overcomplicating this problem.

Is it save to state that
  • You want first 8 columns (i.e. A-H), then last 2 columns, then the 9th and 10th columns
  • The column names for first 10 columns are the same as in the original file, the last 2 columns (former 9th and 10th) will have new header - AttributeName and AttributeValue

fname = 'VY6_Row1_report_url_detections_per_frame.csv'
# fname = 'VY6_Row2_report_url_detections_per_frame.csv'

df =  pd.read_csv(fname)
col_drop = len(df.columns) - 10
df['AttributeName'] = df[df.columns[8]]
df['AttributeValue'] = df[df.columns[9]]
df.drop(labels=df.columns[8:8+col_drop], axis=1, inplace=True)
df.to_csv('VY6_out.csv', index=False)
Do you need to use pandas? Using csv module would also do.