Python Forum
Statsmodels Multiple Regression Syntax Error - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Statsmodels Multiple Regression Syntax Error (/thread-34088.html)



Statsmodels Multiple Regression Syntax Error - Burger - Jun-24-2021

I've been able to use the statsmodels.api regression when assigning variables to x and y with no issues, however, now I am trying to use the statsmodels.formula.api to to run a multiple regression that includes 1 categorical variable while utilizing the formual= function. I'm familiar with regression models in R, but now I'm switching over to Python and running into issues. I keep getting the following error:

File "<unknown>, Line 1

C(Work Country)

SyntaxError: invalid syntax



The code I am running that is causing the error is below:

import pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import openpyxl
import statsmodels.formula.api as smf
import statsmodels.formula.api as ols

df = pd.read_excel('C:/File/data1')

model = smf.ols(formula= 'Age ~ C(Work Country) + Height', data = df).fit()



Any help would be grateful


RE: Statsmodels Multiple Regression Syntax Error - Caprone - Jun-27-2021

post a reproducible code-example...with representative features in your dataframe


RE: Statsmodels Multiple Regression Syntax Error - Burger - Jul-13-2021

I have located the answer. When using statsmodels whitespace is not properly recognized as part of a column name. See below:

https://stackoverflow.com/questions/52861445/why-does-statsmodels-ols-doesnt-support-reading-in-columns-with-multiple-words