Feedback on first Class

fffrost · Jul-01-2018, 08:00 AM

Hi, I decided it was time to start learning to use classes. The goal was to write a class for a t-test that has various methods and class variables that I can call, for example my_ttest.t (which returns the 't' statistic). My issue is that, now I have finished it, I have never seen a class that contains this many self.? variables. Maybe I misunderstood something, but I do want to be able to say my_ttest.t, my_ttest.p, my_ttest.whatever. Maybe it would just be better if I didn't have this as a class? But that would only reinforce my thought that maybe I'll never need to make use of classes... I'm just looking for feedback on how to approach building classes like this. Here's the code (if you want to run then you'll need statsmodels, numpy, pandas, and scipy):

class ttest_ind:
    """Takes two input vectors & carries out an independent-samples t-test
    for differences. Parametric hypothesis test for differences between
    two independent samples.
    
    self.x = x vector
    self.y = y vector
    self.dof_? = degree of freedom
    self.t_? = test statistic (t)
    self.p_? = significance p value
    self.r = Descriptive statistics and tests, see statsmodels.stats.api
    self.result_? = result tuple from ttest (t, p, dof)
    self.CI_? = confidence intervals
    self.cohen_d_? = cohen's d"""
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
        # levene's test for homogeneity of variance
        self.levenes = stats.levene(self.x, self.y, center='mean')
        self.levenes_F = self.levenes[0]
        self.levenes_p = self.levenes[1]
        
        self.r = sms.CompareMeans(sms.DescrStatsW(self.x), sms.DescrStatsW(self.y))
        
        self.result_equal = self.r.ttest_ind(usevar='pooled')
        self.t_equal, self.p_equal, self.dof_equal = self.result_equal[0], self.result_equal[1], self.result_equal[2]
        self.CI_equal = self.r.tconfint_diff(usevar='pooled')
        self.cohen_d_equal = np.mean(self.x) - np.mean(self.y) / self.r.std_meandiff_pooledvar
        
        self.result_uneq = self.r.ttest_ind(usevar='unequal')
        self.t_uneq, self.p_uneq, self.dof_uneq = self.result_uneq[0], self.result_uneq[1], self.result_uneq[2]
        self.CI_uneq = self.r.tconfint_diff(usevar='unequal')
        self.cohen_d_uneq = (np.mean(self.x) - np.mean(self.y)) / (np.sqrt((np.std(self.x, ddof=1) ** 2 + np.std(self.y, ddof=1) ** 2) / 2))
    
    def summary(self):
        result_values = [self.levenes_F, self.levenes_p,
                         self.t_equal, self.t_uneq,
                         self.dof_equal, self.dof_uneq,
                         self.p_equal, self.p_uneq,
                         np.mean(self.x - self.y),
                         stats.sem(self.x - self.y, ddof=1),
                         self.CI_equal[0], self.CI_equal[1],
                         self.CI_uneq[0], self.CI_uneq[1],
                         self.cohen_d_equal, self.cohen_d_uneq]
        result_labels = ['Levene\'s F', 'Levene\'s p',
                         't (equal var.)', 't (unequal var.)',
                         'df (equal var.)', 'df (unequal var.)',
                         'Sig. (p, equal var.)', 'Sig. (p, unequal var.)',
                         'Mean difference', 'Std Error difference',
                         '95% CI (lower) equal var.', '95% CI (upper) equal var.',
                         '95% CI (lower) unequal var.', '95% CI (upper) unequal var.',
                         'Cohen\'s d (equal)', 'Cohen\'s d (unequal)']
        return pd.Series(data=result_values, index=result_labels)

***ichabod801*** · Jul-01-2018, 01:13 PM

That many attributes (self. variables) is not out of hand. I am frequently working with classes that have 26 attributes.

Classes and functions are two different ways to organize your code. Both are ways to not write the same code over and over again. Should it be a class or a function? I don't know. Good reasons for classes: it's a big chunk of data you will access a lot or pass from function to function, it represents an object to you conceptually, it has a lot of related functions to turn into methods, you want to make collections of complicated results like lists or trees, you want to use it natively for things like print, len, sorting, equality testing, or math.

Personally, it seems odd to make a t-test into an object to me. But that's me, and that's me not knowing what you want to do with this in your code. I can see situations where this would be useful, although I would add support for multiple comparisons in those situations.

If you are going to go with this as a class, I would make result_labels a class attribute, rather than generating it each time you run summary.

fffrost · Jul-01-2018, 05:54 PM

Ok! Thanks for the feedback.

Feedback on first Class

User Panel Messages

Announcements