Python Forum

Full Version: How to find column index and its corresponding column name
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have below data:

import pandas as pd
import numpy as np
from scipy import stats
dataFileName='RFInput.xlsx'
sheetName='Rawdata'
sheetNamePara='paraList'
dataRaw=pd.read_excel(dataFileName, sheetname = sheetName)
datapara=pd.read_excel(dataFileName, sheetname = sheetNamePara)

noData=len(dataRaw)

labels = datapara
x = dataRaw[labels]

Rawdata:

A   B     C      D    E      F
0   1.2   1.6   3.2  3.2    1.6
1   1.2   1.6   3.2  3.2    1.6
2   2.6   1.9   6.5  6.5    1.9
0   1.2   1.6   3.2  3.2    1.6
1   2.6   1.9   6.5  6.5    1.9
4   1.2   1.6   3.2  3.2    1.6


paraList:
A   C  E  F
Y   N  Y  Y
I want to find column index of Y and N in paralist, and corresponding column names:

Y type column names are A, E, F, and its data in Rawdata,
data_Y:

A    E      F
0   3.2    1.6
1   3.2    1.6
2   6.5    1.9
0   3.2    1.6
1   6.5    1.9
4   3.2    1.6

data_N:
C  
1.6  
1.6   
1.9   
1.6  
1.9   
1.6
The problem I see is that your paraList is not really a data frame as it has only one row... so looks like it is going to work better as a dictionary:
datapara.iloc[0].to_dict()
Output:
{'A': 'Y', 'C': 'N', 'E': 'Y', 'F': 'Y'}
Now it is easy to create a selector for the columns that has value 'Y'
selection = datapara.iloc[0].to_dict()
cols = [c for c in selection if selection[c] == 'Y']
print(dataRaw[cols])
Output:
A E F 0 0 3.2 1.6 1 1 3.2 1.6 2 2 6.5 1.9 3 0 3.2 1.6 4 1 6.5 1.9 5 4 3.2 1.6
To select the ones with 'N', you can use a similar process.
Remember also to add guards for the cases when a column is only in one of the tables... for example in this code a column only in datapara will raise an exception.
(May-09-2018, 09:53 PM)killerrex Wrote: [ -> ]....
Now it is easy to create a selector for the columns that has value 'Y'
selection = datapara.iloc[0].to_dict()
cols = [c for c in selection if selection[c] == 'Y']
print(dataRaw[cols])

is built on assumption that every key with value "Y" in selector corresponds to a column in dataRaw - which is in general is an unsafe assumption

cols = [c for c in daraRaw.columns if selection.get(c) == 'Y']
removes this problem