Python Forum

Hi,

I have below data:

import pandas as pd
import numpy as np
from scipy import stats
dataFileName='RFInput.xlsx'
sheetName='Rawdata'
sheetNamePara='paraList'
dataRaw=pd.read_excel(dataFileName, sheetname = sheetName)
datapara=pd.read_excel(dataFileName, sheetname = sheetNamePara)

noData=len(dataRaw)

labels = datapara
x = dataRaw[labels]

Rawdata:

A   B     C      D    E      F
0   1.2   1.6   3.2  3.2    1.6
1   1.2   1.6   3.2  3.2    1.6
2   2.6   1.9   6.5  6.5    1.9
0   1.2   1.6   3.2  3.2    1.6
1   2.6   1.9   6.5  6.5    1.9
4   1.2   1.6   3.2  3.2    1.6


paraList:
A   C  E  F
Y   N  Y  Y

I want to find column index of Y and N in paralist, and corresponding column names:

Y type column names are A, E, F, and its data in Rawdata,

data_Y:

A    E      F
0   3.2    1.6
1   3.2    1.6
2   6.5    1.9
0   3.2    1.6
1   6.5    1.9
4   3.2    1.6

data_N:
C  
1.6  
1.6   
1.9   
1.6  
1.9   
1.6

The problem I see is that your paraList is not really a data frame as it has only one row... so looks like it is going to work better as a dictionary:

datapara.iloc[0].to_dict()

Output:
{'A': 'Y', 'C': 'N', 'E': 'Y', 'F': 'Y'}

Now it is easy to create a selector for the columns that has value 'Y'

selection = datapara.iloc[0].to_dict()
cols = [c for c in selection if selection[c] == 'Y']
print(dataRaw[cols])

Output:   A    E    F
0  0  3.2  1.6
1  1  3.2  1.6
2  2  6.5  1.9
3  0  3.2  1.6
4  1  6.5  1.9
5  4  3.2  1.6

To select the ones with 'N', you can use a similar process.
Remember also to add guards for the cases when a column is only in one of the tables... for example in this code a column only in datapara will raise an exception.

(May-09-2018, 09:53 PM)killerrex Wrote: [ -> ]....
Now it is easy to create a selector for the columns that has value 'Y'
selection = datapara.iloc[0].to_dict()
cols = [c for c in selection if selection[c] == 'Y']
print(dataRaw[cols])

is built on assumption that every key with value "Y" in selector corresponds to a column in dataRaw - which is in general is an unsafe assumption

cols = [c for c in daraRaw.columns if selection.get(c) == 'Y']

removes this problem

Raj

killerrex

volcano63