Python Forum

Full Version: Help with pandas
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all

I have the following dataframe that has been extracted from an Excel spreadsheet that I need to manipulate in code.
                  500Hz  1000Hz  1500Hz  2000Hz  3000Hz  4000Hz  6000Hz  8000Hz
Right Ear (dB)      5       5       5      10      70       5      10      10
Left Ear (dB)      10      10      20      30       0      20      20      10
Essentially what I need to do is extract a few columns and calculate the average dB for each ear. The result should be something like this:
                3000Hz  4000Hz  6000Hz    Average
Right Ear (dB)      70       5      10      28.3
Left Ear (dB)        0      20      20      13.3
The code below achieves this quite nicely:
import pandas as pd

df = pd.read_csv('audio.csv)
currentdf = df.loc[:,('3000Hz','4000Hz','6000Hz')]
currentdf['average'] = (df.loc[:,'3000Hz'] + df.loc[:,'4000Hz'] + df.loc[:,'6000Hz'])/3
print(currentdf)
Trouble is, I have five different combinations of different frequencies (columns) that I need to average out, each with different number of columns (e.g averaging the values of 500Hz and 1000Hz). Rather than repeating myself within each method within the class I will define a base method and derive the required combinations from that method. I tried the following but its giving me weird results:

import pandas as pd

class audiogram(object):
        def __init__(self,audiodf):
            self.audiodf = audiodf

        def __basemethod(self,*frequency):
            newdf = df.loc[:,(f for f in frequency)]
            for f in frequency:
                newdf['average'] = (df[f].sum())/len(list(frequency))     #???? not sure how else to do it
                return newdf

        def first_criteria(self):
            print(self.__basemethod('3000Hz','4000Hz','6000Hz'))


df = pd.read_csv('audio.csv')
x = audiogram(df)
x.first_criteria()
The CSV file is:

,500Hz,1000Hz,1500Hz,2000Hz,3000Hz,4000Hz,6000Hz,8000Hz
Right Ear (dB),5,5,5,10,70,5,10,10
Left Ear (dB),10,10,20,30,0,20,20,10
I get the following output:
Output:
3000Hz 4000Hz 6000Hz average Right Ear (dB) 70 5 10 23.333333 Left Ear (dB) 0 20 20 23.333333
....which is not what I am expecting.

I know where the problem is (its in the commented line in the basemethod private method but I am running out of ideas on how to fix it. Some help would be much appreciated.

TIA

PS: Apologies, I clicked the post button in error, was meant to click the preview post button instead. Is there a way to remove the post if it has been posted accidentally?
Edit: Changed to csv instead to make troubleshooting easier.
That calculation is done for each frequency. Don't think you want to do that. You want to get the sum of the values in the frequency column. I would create a sum column using
newdf["sum"]=newdf.sum(axis=1)
, you may need to specify the columns to include.

Just noted your last line.
Something like this if you want that calculation.
Taken frequency tuple out of class so it can be give to class as a argument.
import pandas as pd

class audiogram(object):
    def __init__(self,audiodf):
        self.audiodf = audiodf          
 
    def basemethod(self, frequency):
        new_df = self.audiodf.loc[:, frequency]
        lst = []
        for i in new_df.columns.to_list():
            lst.append(new_df.loc[:,i])
            new_df['average'] = sum(lst) / 3
        return new_df
    
if __name__ == '__main__':
    file_name = 'audio.xlsx'
    freq = ('3000Hz', '4000Hz', '6000Hz') 
    # freq = ('4000Hz', '6000Hz', '8000Hz')   
    df = pd.read_excel(file_name, index_col=0)    
    audio = audiogram(df)
    new_df = audio.basemethod(freq)
    print(new_df)
Output:
3000Hz 4000Hz 6000Hz average Right Ear (dB) 70 5 10 28.333333 Left Ear (dB) 0 20 20 13.333333
Thank you jefsummers and snippsat for your contribution.

Is there a way undo posting the thread? I keep clicking 'post reply' by mistake instead of 'preview post'.

Thanks again