Printing effect sizes for variables in an anova

Printing effect sizes for variables in an anova - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Printing effect sizes for variables in an anova (/thread-39334.html)

Printing effect sizes for variables in an anova - eyavuz21 - Feb-01-2023

Hey all,

I have the following code.

resultmodeldistancevariation2sleep = smf.ols(formula='weighteddistance ~ age + C(gender) + C(highest_education_level_acheived)',data=x).fit()
resultmodeldistancevariation2sleep.summary()

I am trying to print the output of a linear model, using smf.ols. However, when I print the output below, I see that for the categorical variables, one of the categories within each categorical variable is used as a baseline . So for example it uses group 1.0 for gender as a baseline. The other categories within that categorical variable (gender[T.2.0] and gender[T.3.0] are then compared to that baseline category).

	coef	std err	t	P>|t|	[0.025	0.975]
Intercept	-0.6726	0.220	-3.058	0.002	-1.104	-0.241
C(gender)[T.2.0]	0.1905	0.050	3.822	0.000	0.093	0.288
C(gender)[T.3.0]	0.2810	0.174	1.619	0.106	-0.060	0.622
C(highest_education_level_acheived)[T.3]	0.0115	0.208	0.056	0.956	-0.397	0.420
C(highest_education_level_acheived)[T.4]	-0.0295	0.214	-0.138	0.890	-0.449	0.390
C(highest_education_level_acheived)[T.5]	0.0912	0.207	0.439	0.660	-0.316	0.499
C(highest_education_level_acheived)[T.6]	0.2657	0.219	1.216	0.224	-0.163	0.695
C(highest_education_level_acheived)[T.7]	0.3885	0.253	1.539	0.124	-0.107	0.884
age	0.0150	0.003	4.716	0.000	0.009	0.02

However, I want to see the effect of the categorical variable as a whole and not each category within that variable. I thus place the smf.ols model output into an anova using 'anova_lm':

anovaoutput = sm.stats.anova_lm(resultmodeldistancevariation2sleep)
anovaoutput['PR(>F)'] = anovaoutput['PR(>F)'].round(4)

	df	sum_sq	mean_sq	F	PR(>F)
C(gender)	2.0	4.227966	2.113983	5.681874	0.0036
C(highest_education_level_acheived)	5.0	11.425706	2.285141	6.141906	0.0000
age	1.0	8.274317	8.274317	22.239357	0.0000
Residual	647.0	240.721120	0.372057	NaN	NaN

However, neither the confidence intervals or the coefficients are printed in this output. How can I amend my code to print these values as part of the anova_lm output?

Is there also a way I can calculate the partial eta squared?

Would be so grateful for a helping hand!

The dataframe x looks like this (first 5 rows):

age	gender	highest_education_level_acheived	hours_of_phone_use_per_week	weight	height	drink_alcohol_yes_no	drink_caffeine_yes_no	growupenvironment	sunlight	frequency_of_naps	weighteddistance
0	24.0	1.0	4	13.0	201.0	69.0	1	1	1.0	0.666667	2	-0.423448
1	33.0	1.0	3	10.0	140.0	68.0	2	2	2.0	0.500000	3	-0.375761
3	34.0	1.0	3	5.0	170.0	72.0	1	2	2.0	0.166667	3	-0.197738
4	32.0	1.0	4	1.0	205.0	69.0	1	1	2.0	1.000000	1	-0.767542
7	23.0	1.0	5	5.0	180.0	72.0	1	1	2.0	0.333333	1	0.190099

RE: Printing effect sizes for variables in an anova - Larz60+ - Feb-01-2023

to suggest ammending you code, we should first be able to examine your code.
You haven't provided that.
There are examples:
All examples
Ordinary Least Squares

There is also scipy see sklearn linear model

RE: Printing effect sizes for variables in an anova - eyavuz21 - Feb-01-2023

(Feb-01-2023, 11:14 AM)Larz60+ Wrote: to suggest ammending you code, we should first be able to examine your code.
You haven't provided that.
There are examples:
All examples
Ordinary Least Squares

There is also scipy see sklearn linear model

I have provided the first 5 lines of the dataframe 'x', which I think should be enough to understand my issue.

Thank you for the suggestions with those links but sadly my issue has not yet been solved.

Is that enough information for you? :) Let me know what else to provide!