Hey all,
I have the following code.
Is there also a way I can calculate the partial eta squared?
Would be so grateful for a helping hand!
The dataframe x looks like this (first 5 rows):
I have the following code.
resultmodeldistancevariation2sleep = smf.ols(formula='weighteddistance ~ age + C(gender) + C(highest_education_level_acheived)',data=x).fit() resultmodeldistancevariation2sleep.summary()I am trying to print the output of a linear model, using smf.ols. However, when I print the output below, I see that for the categorical variables, one of the categories within each categorical variable is used as a baseline . So for example it uses group 1.0 for gender as a baseline. The other categories within that categorical variable (gender[T.2.0] and gender[T.3.0] are then compared to that baseline category).
coef std err t P>|t| [0.025 0.975] Intercept -0.6726 0.220 -3.058 0.002 -1.104 -0.241 C(gender)[T.2.0] 0.1905 0.050 3.822 0.000 0.093 0.288 C(gender)[T.3.0] 0.2810 0.174 1.619 0.106 -0.060 0.622 C(highest_education_level_acheived)[T.3] 0.0115 0.208 0.056 0.956 -0.397 0.420 C(highest_education_level_acheived)[T.4] -0.0295 0.214 -0.138 0.890 -0.449 0.390 C(highest_education_level_acheived)[T.5] 0.0912 0.207 0.439 0.660 -0.316 0.499 C(highest_education_level_acheived)[T.6] 0.2657 0.219 1.216 0.224 -0.163 0.695 C(highest_education_level_acheived)[T.7] 0.3885 0.253 1.539 0.124 -0.107 0.884 age 0.0150 0.003 4.716 0.000 0.009 0.02However, I want to see the effect of the categorical variable as a whole and not each category within that variable. I thus place the smf.ols model output into an anova using 'anova_lm':
anovaoutput = sm.stats.anova_lm(resultmodeldistancevariation2sleep) anovaoutput['PR(>F)'] = anovaoutput['PR(>F)'].round(4) df sum_sq mean_sq F PR(>F) C(gender) 2.0 4.227966 2.113983 5.681874 0.0036 C(highest_education_level_acheived) 5.0 11.425706 2.285141 6.141906 0.0000 age 1.0 8.274317 8.274317 22.239357 0.0000 Residual 647.0 240.721120 0.372057 NaN NaNHowever, neither the confidence intervals or the coefficients are printed in this output. How can I amend my code to print these values as part of the anova_lm output?
Is there also a way I can calculate the partial eta squared?
Would be so grateful for a helping hand!
The dataframe x looks like this (first 5 rows):
age gender highest_education_level_acheived hours_of_phone_use_per_week weight height drink_alcohol_yes_no drink_caffeine_yes_no growupenvironment sunlight frequency_of_naps weighteddistance 0 24.0 1.0 4 13.0 201.0 69.0 1 1 1.0 0.666667 2 -0.423448 1 33.0 1.0 3 10.0 140.0 68.0 2 2 2.0 0.500000 3 -0.375761 3 34.0 1.0 3 5.0 170.0 72.0 1 2 2.0 0.166667 3 -0.197738 4 32.0 1.0 4 1.0 205.0 69.0 1 1 2.0 1.000000 1 -0.767542 7 23.0 1.0 5 5.0 180.0 72.0 1 1 2.0 0.333333 1 0.190099