Python Forum

Full Version: Statsmodels Multiple Regression Syntax Error
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I've been able to use the statsmodels.api regression when assigning variables to x and y with no issues, however, now I am trying to use the statsmodels.formula.api to to run a multiple regression that includes 1 categorical variable while utilizing the formual= function. I'm familiar with regression models in R, but now I'm switching over to Python and running into issues. I keep getting the following error:

File "<unknown>, Line 1

C(Work Country)

SyntaxError: invalid syntax



The code I am running that is causing the error is below:

import pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import openpyxl
import statsmodels.formula.api as smf
import statsmodels.formula.api as ols

df = pd.read_excel('C:/File/data1')

model = smf.ols(formula= 'Age ~ C(Work Country) + Height', data = df).fit()



Any help would be grateful
post a reproducible code-example...with representative features in your dataframe
I have located the answer. When using statsmodels whitespace is not properly recognized as part of a column name. See below:

https://stackoverflow.com/questions/5286...iple-words