Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with pandas
#1
Hi all

I have the following dataframe that has been extracted from an Excel spreadsheet that I need to manipulate in code.
                  500Hz  1000Hz  1500Hz  2000Hz  3000Hz  4000Hz  6000Hz  8000Hz
Right Ear (dB)      5       5       5      10      70       5      10      10
Left Ear (dB)      10      10      20      30       0      20      20      10
Essentially what I need to do is extract a few columns and calculate the average dB for each ear. The result should be something like this:
                3000Hz  4000Hz  6000Hz    Average
Right Ear (dB)      70       5      10      28.3
Left Ear (dB)        0      20      20      13.3
The code below achieves this quite nicely:
import pandas as pd

df = pd.read_csv('audio.csv)
currentdf = df.loc[:,('3000Hz','4000Hz','6000Hz')]
currentdf['average'] = (df.loc[:,'3000Hz'] + df.loc[:,'4000Hz'] + df.loc[:,'6000Hz'])/3
print(currentdf)
Trouble is, I have five different combinations of different frequencies (columns) that I need to average out, each with different number of columns (e.g averaging the values of 500Hz and 1000Hz). Rather than repeating myself within each method within the class I will define a base method and derive the required combinations from that method. I tried the following but its giving me weird results:

import pandas as pd

class audiogram(object):
        def __init__(self,audiodf):
            self.audiodf = audiodf

        def __basemethod(self,*frequency):
            newdf = df.loc[:,(f for f in frequency)]
            for f in frequency:
                newdf['average'] = (df[f].sum())/len(list(frequency))     #???? not sure how else to do it
                return newdf

        def first_criteria(self):
            print(self.__basemethod('3000Hz','4000Hz','6000Hz'))


df = pd.read_csv('audio.csv')
x = audiogram(df)
x.first_criteria()
The CSV file is:

,500Hz,1000Hz,1500Hz,2000Hz,3000Hz,4000Hz,6000Hz,8000Hz
Right Ear (dB),5,5,5,10,70,5,10,10
Left Ear (dB),10,10,20,30,0,20,20,10
I get the following output:
Output:
3000Hz 4000Hz 6000Hz average Right Ear (dB) 70 5 10 23.333333 Left Ear (dB) 0 20 20 23.333333
....which is not what I am expecting.

I know where the problem is (its in the commented line in the basemethod private method but I am running out of ideas on how to fix it. Some help would be much appreciated.

TIA

PS: Apologies, I clicked the post button in error, was meant to click the preview post button instead. Is there a way to remove the post if it has been posted accidentally?
Edit: Changed to csv instead to make troubleshooting easier.
Reply
#2
That calculation is done for each frequency. Don't think you want to do that. You want to get the sum of the values in the frequency column. I would create a sum column using
newdf["sum"]=newdf.sum(axis=1)
, you may need to specify the columns to include.

Just noted your last line.
Reply
#3
Something like this if you want that calculation.
Taken frequency tuple out of class so it can be give to class as a argument.
import pandas as pd

class audiogram(object):
    def __init__(self,audiodf):
        self.audiodf = audiodf          
 
    def basemethod(self, frequency):
        new_df = self.audiodf.loc[:, frequency]
        lst = []
        for i in new_df.columns.to_list():
            lst.append(new_df.loc[:,i])
            new_df['average'] = sum(lst) / 3
        return new_df
    
if __name__ == '__main__':
    file_name = 'audio.xlsx'
    freq = ('3000Hz', '4000Hz', '6000Hz') 
    # freq = ('4000Hz', '6000Hz', '8000Hz')   
    df = pd.read_excel(file_name, index_col=0)    
    audio = audiogram(df)
    new_df = audio.basemethod(freq)
    print(new_df)
Output:
3000Hz 4000Hz 6000Hz average Right Ear (dB) 70 5 10 28.333333 Left Ear (dB) 0 20 20 13.333333
Reply
#4
Thank you jefsummers and snippsat for your contribution.

Is there a way undo posting the thread? I keep clicking 'post reply' by mistake instead of 'preview post'.

Thanks again
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020