Python Forum
stats model OLS question/issue
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
stats model OLS question/issue
#1
Good Afternoon All,

I am a newbie with Python and coding and trying to model something. I am importing a CSV file and trying to use the OLS function to run a linear regression model. This is the code I'm using:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

import scipy as sp

import statsmodels.formula.api as smf

import sys

from IPython.display import display, HTML

 

 

print(sys.path)

 

chem_film_data_df = pd.read_csv (r"chem_film_data.csv")

columns_to_keep = ['Date', 'pass_fail', 'C-IC_6MU', 'Nitric_Acid', 'Etch_Rate']

chem_film_data_df = chem_film_data_df[columns_to_keep]

print(chem_film_data_df)

 

display(HTML(chem_film_data_df.head().to_html()))

 

print("Number of rows in the dataset =", len(chem_film_data_df))

 

#plt.plot(chem_film_data_df['C-IC_6MU'], chem_film_data_df['pass_fail'],'o')

#plt.title('Test', fontsize=20)

#plt.xlabel('C-IC 6MU Test')

#plt.ylabel('pass-fail test')

#plt.show()

 

chem_film_model = smf.ols("pass_fail ~ C-IC_6MU + Nitric_Acid + Etch_Rate", chem_film_data_df).fit()

print(chem_film_model.summary())
I've added a few lines of code that allowed me to verify that it was importing the correct information from the CSV, which it is, but when it gets to line 27 it outputs an error. The commented out code was a test I ran to verify that data from the CSV was being imported and I was able to use it.

The test output is correct:
Output:
['C:\\Users\\JA21877\\Desktop\\python', 'C:\\Program Files\\Python38\\python38.zip', 'C:\\Program Files\\Python38\\DLLs', 'C:\\Program Files\\Python38\\lib', 'C:\\Program Files\\Python38', 'C:\\Users\\JA21877\\AppData\\Roaming\\Python\\Python38\\site-packages', 'C:\\Users\\JA21877\\AppData\\Roaming\\Python\\Python38\\site-packages\\win32', 'C:\\Users\\JA21877\\AppData\\Roaming\\Python\\Python38\\site-packages\\win32\\lib', 'C:\\Users\\JA21877\\AppData\\Roaming\\Python\\Python38\\site-packages\\Pythonwin', 'C:\\Program Files\\Python38\\lib\\site-packages', 'C:\\Users\\JA21877\\AppData\\Roaming\\Python\\Python38\\site-packages\\IPython\\extensions'] Date pass_fail C-IC_6MU Nitric_Acid Etch_Rate 0 10/1/2018 1 5.9050 87.191 0.309628 1 10/4/2018 1 5.9050 87.191 0.309628 2 10/4/2018 1 5.9050 87.191 0.354437 3 10/4/2018 1 5.9050 87.191 0.222378 4 10/4/2018 1 5.9050 87.191 0.133427 .. ... ... ... ... ... 220 7/13/2020 1 5.9710 91.839 0.211808 221 7/17/2020 1 6.0131 92.466 0.235234 222 7/22/2020 1 6.4769 94.186 0.342308 223 7/24/2020 1 6.4777 88.799 0.316139 224 7/29/2020 1 6.4337 94.687 0.314921 [225 rows x 5 columns] <IPython.core.display.HTML object> Number of rows in the dataset = 225
But then I receive this message:
Error:
Traceback (most recent call last): File "C:\Users\JA21877\Desktop\python\chem_film_predictive_model.py", line 27, in <module> chem_film_model = smf.ols("pass_fail ~ C-IC_6MU + Nitric_Acid + Etch_Rate", chem_film_data_df).fit() File "C:\Users\JA21877\AppData\Roaming\Python\Python38\site-packages\statsmodels\base\model.py", line 168, in from_formula tmp = handle_formula_data(data, None, formula, depth=eval_env, File "C:\Users\JA21877\AppData\Roaming\Python\Python38\site-packages\statsmodels\formula\formulatools.py", line 64, in handle_formula_data result = dmatrices(formula, Y, depth, return_type='dataframe', File "C:\Users\JA21877\AppData\Roaming\Python\Python38\site-packages\patsy\highlevel.py", line 309, in dmatrices (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env, File "C:\Users\JA21877\AppData\Roaming\Python\Python38\site-packages\patsy\highlevel.py", line 167, in _do_highlevel_design return build_design_matrices(design_infos, data, File "C:\Users\JA21877\AppData\Roaming\Python\Python38\site-packages\patsy\build.py", line 893, in build_design_matrices rows_checker.check(value.shape[0], name, origin) File "C:\Users\JA21877\AppData\Roaming\Python\Python38\site-packages\patsy\build.py", line 795, in check raise PatsyError(msg, origin) patsy.PatsyError: Number of rows mismatch between data argument and C (225 versus 1) pass_fail ~ C-IC_6MU + Nitric_Acid + Etch_Rate ^ [Finished in 3.1s]
Any idea what is going on? I'm not sure why its saying there is a row mismatch, when the output of the rows/columns shows they are the same. Thank you in advance for your help!
Reply


Messages In This Thread
stats model OLS question/issue - by russoj5 - Aug-04-2020, 04:56 PM
RE: stats model OLS question/issue - by scidam - Aug-05-2020, 12:48 AM
RE: stats model OLS question/issue - by russoj5 - Aug-05-2020, 12:02 PM
RE: stats model OLS question/issue - by scidam - Aug-06-2020, 02:35 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  issue displaying summary of whole keras CNN model on TensorFlow with python Afrodizzyjack 0 1,670 Oct-27-2021, 04:07 PM
Last Post: Afrodizzyjack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020