![]() |
Factors in statsmodels mixedlm - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Factors in statsmodels mixedlm (/thread-29989.html) |
Factors in statsmodels mixedlm - rleduc42 - Sep-28-2020 Hello All, I'm trying to calculate a nested effect mixed model where both a and b are factors (in R terminology) or class variables (in SAS). The idea is that a random factor, b, is nested in the fixed effect a. In R I would write: df$a <- factor(df$a) df$b <- factor(df$b) test.model <- lmer(val ~ a+(1|b), data=df) anova(test.model) And I get the correct answer. I know several wrong ways to do this in Python, hopefully someone can point me towards a correct on. import statsmodels.formula.api as smf import pandas as pd Problem_data = {"a":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], "b":[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8], "responce":[3,6,3,3,1,2,2,2,5,6,5,6,2,3,4,3,7,8,7,6,4,5,4,3,7,8,9,8,10,10,9,11]} df_11_4 = pd.DataFrame(Problem_data,columns=['a','b','responce']) md = smf.mixedlm("responce ~ C(a)", df_11_4, groups=df_11_4["b"]) mdf = md.fit() print(mdf.f_test("C(a)[T.2]"))This yields: F test: F=array([[6.45933505]]), p=0.016442613303300678, df_denom=30, df_num=1 which is wrong--well, I'm sure it is right, but it's not what I'm trying to get. If I had done it correctly, df_denom would equal 6 and p=0.044. For the record, this problem is the worked example in "Experimental Design: Procedures for the Behavioral Science 3", Kirk, 1995. p.482. I use this as my "known correct" example. RE: Factors in statsmodels mixedlm - rleduc42 - Oct-23-2020 I thought I would post an update here for anyone coming later--and maybe someone will see something new that helps. I have figured out how to get the two variables encoded as type categorical--which I believe to be the equivalent of R's Factor. But I now get an "Unsupported Operand type(s)" error from Patsy. Here is the code with the full text of the error at the end. #%% import statsmodels.formula.api as smf import pandas as pd from patsy import dmatrices, dmatrix, demo_data import statsmodels.regression.mixed_linear_model as smmlm Problem_data = {"a":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], "b":[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8], "response":[3,6,3,3,1,2,2,2,5,6,5,6,2,3,4,3,7,8,7,6,4,5,4,3,7,8,9,8,10,10,9,11]} df_11_4 = pd.DataFrame(Problem_data,columns=['a','b','response']) #%% df_11_4["a"] = df_11_4["a"].astype('category') df_11_4["b"] = df_11_4["b"].astype('category') print(df_11_4.dtypes) #%% # This gives an error dmatrices("response ~ a+(1|b)", df_11_4) # PatsyError: Error evaluating factor: TypeError: unsupported operand type(s) for |: 'int' and 'Categorical' # response ~ a+(1|b) # ^^^^^ #%% md = smf.mixedlm("response ~ a+(1|b)", df_11_4, groups=df_11_4["b"]) # PatsyError: Error evaluating factor: TypeError: unsupported operand type(s) for |: 'int' and 'Categorical' # response ~ a+(1|b) # ^^^^^ print(dmatrix(df_11_4)) mdf = md.fit() print(mdf.summary()) print(mdf.f_test("C(a)[T.2]")) #---------------------------------------------------------- # Full Error message #---------------------------------------------------------- """ dmatrices("response ~ a+(1|b)", df_11_4) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\compat.py in call_and_wrap_exc(msg, origin, f, *args, **kwargs) 35 try: ---> 36 return f(*args, **kwargs) 37 except Exception as e: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\eval.py in eval(self, expr, source_name, inner_namespace) 165 return eval(code, {}, VarLookupDict([inner_namespace] --> 166 + self._namespaces)) 167 <string> in <module> C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\ops\common.py in new_method(self, other) 64 ---> 65 return method(self, other) 66 C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\ops\__init__.py in wrapper(self, other) 393 --> 394 res_values = logical_op(lvalues, rvalues, op) 395 return self._construct_result(res_values, name=res_name) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\ops\array_ops.py in logical_op(left, right, op) 335 # Call the method on lvalues --> 336 res_values = op(lvalues, rvalues) 337 C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\pandas\core\ops\roperator.py in ror_(left, right) 55 def ror_(left, right): ---> 56 return operator.or_(right, left) 57 TypeError: unsupported operand type(s) for |: 'int' and 'Categorical' The above exception was the direct cause of the following exception: PatsyError Traceback (most recent call last) c:\Users\rledu\Documents\_Bioinformatics\python\ANOVA_examples\Forum_Example_2.py in ----> 19 dmatrices("response ~ a+(1|b)", df_11_4) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\highlevel.py in dmatrices(formula_like, data, eval_env, NA_action, return_type) 308 eval_env = EvalEnvironment.capture(eval_env, reference=1) 309 (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env, --> 310 NA_action, return_type) 311 if lhs.shape[1] == 0: 312 raise PatsyError("model is missing required outcome variables") C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\highlevel.py in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type) 163 return iter([data]) 164 design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env, --> 165 NA_action) 166 if design_infos is not None: 167 return build_design_matrices(design_infos, data, C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\highlevel.py in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action) 68 data_iter_maker, 69 eval_env, ---> 70 NA_action) 71 else: 72 return None C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\build.py in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action) 694 factor_states, 695 data_iter_maker, --> 696 NA_action) 697 # Now we need the factor infos, which encapsulate the knowledge of 698 # how to turn any given factor into a chunk of data: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\build.py in _examine_factor_types(factors, factor_states, data_iter_maker, NA_action) 441 for data in data_iter_maker(): 442 for factor in list(examine_needed): --> 443 value = factor.eval(factor_states[factor], data) 444 if factor in cat_sniffers or guess_categorical(value): 445 if factor not in cat_sniffers: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\eval.py in eval(self, memorize_state, data) 564 return self._eval(memorize_state["eval_code"], 565 memorize_state, --> 566 data) 567 568 __getstate__ = no_pickling C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\eval.py in _eval(self, code, memorize_state, data) 549 memorize_state["eval_env"].eval, 550 code, --> 551 inner_namespace=inner_namespace) 552 553 def memorize_chunk(self, state, which_pass, data): C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\compat.py in call_and_wrap_exc(msg, origin, f, *args, **kwargs) 41 origin) 42 # Use 'exec' to hide this syntax from the Python 2 parser: ---> 43 exec("raise new_exc from e") 44 else: 45 # In python 2, we just let the original exception escape -- better C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\patsy\compat.py in <module> PatsyError: Error evaluating factor: TypeError: unsupported operand type(s) for |: 'int' and 'Categorical' response ~ a+(1|b) ^^^^^ """ |