ANOVA: DataFrame ha no Attribute alpha

Tese · (This post was last modified: Jul-14-2019, 12:36 AM by scidam.)

I'm a beginner in Python, but familiar with statistics. I'm trying to conduct Two Way ANOVA analysis in the use of Python examples from blogs. Specifically, I want to test two detergent brands, called Top and Alpha, in the use of cold and hot water. In the end, I can determine water temperature has in an effect on the effectiveness of detergents.
But, unfortunately, I faced with errors in the beginning on in interaction plot construction and on the degree of freedom calculation as indicated below. I have enclosed below details, hypothetical data, codes, and errors. Please help me!

import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from statsmodels.graphics.factorplots import interaction_plot
import matplotlib.pyplot as plt
from scipy import stats

data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv')
print(data)

Output:   "Detergent_Brands"; "Cold";  "Hot"
0              "top";      4;     10
1            "alpha";      5;     10
2              "top";      6;     12
3            "alpha";      5;     13
4              "top";      6;     11
5            "alpha";      5;     10
6              "top";      4;     12
7            "alpha";      6;     11
8              "top";      4;     12
9            "alpha";      5;     10
10             "top";      6;     11
11           "alpha";      5;     12
12             "top";      6;     10
13           "alpha";      5;     10

N = len(data.len)

df_a = len(data.top.unique()) - 1

df_b = len(data.alpha.unique()) - 1

df_axb = df_a*df_b 

df_w = N - (len(data.top.unique())*len(data.alpha.unique()))

Error:---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-b9e8e4474b44> in <module>
----> 1 N = len(data.len)
      2 df_a = len(data.top.unique()) - 1
      3 df_b = len(data.alpha.unique()) - 1
      4 df_axb = df_a*df_b
      5 df_w = N - (len(data.top.unique())*len(data.alpha.unique()))

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068 
   5069     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'len'

fig = interaction_plot(data.alpha, data.top, data.len,

             colors=['red','blue'], markers=['D','^'], ms=11)

Error:---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-6c5947e09274> in <module>
----> 1 fig = interaction_plot(data.alpha, data.top, data.len,
      2              colors=['red','blue'], markers=['D','^'], ms=11)

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068 
   5069     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'alpha'

**scidam** · (This post was last modified: Jul-14-2019, 01:12 AM by scidam.)

You definitely need to clean up your data and restructure.

remove semicolons and quotes from df; (you can use .apply method to do that)
Do something like this before you start any analysis: df = pd.concat([df, df['Detergent_Brands'].str.get_dummies()], axis=1).drop(['Detergent_Brands'], axis=1).

Instead of accessing to data-frame columns as attributes (e.g. df.top etc), consider using df['top'], df['alpha']. This is approach is more robust, especially in cases when column names collide with data-frame internal methods.

You've got the error because your df hasn't a column named alpha (it hasn't a column named top too). .get_dummies should create these columns (but you need to clean up your data first).

I think, your data have not been properly parsed (loaded). Consider passing a separator, e.g. sep=';' to the read_csv function.

Tese · Jul-14-2019, 03:49 AM

(Jul-14-2019, 01:11 AM)scidam Wrote: You definitely need to clean up your data and restructure.

remove semicolons and quotes from df; (you can use .apply method to do that)

Do something like this before you start any analysis: df = pd.concat([df, df['Detergent_Brands'].str.get_dummies()], axis=1).drop(['Detergent_Brands'], axis=1).

Instead of accessing to data-frame columns as attributes (e.g. df.top etc), consider using df['top'], df['alpha']. This is approach is more robust, especially in cases when column names collide with data-frame internal methods.

You've got the error because your df hasn't a column named alpha (it hasn't a column named top too). .get_dummies should create these columns (but you need to clean up your data first).

I think, your data have not been properly parsed (loaded). Consider passing a separator, e.g. sep=';' to the read_csv function.

Dear Scidam, thank you very much for respond. I will try your instruction, if possible can you give some clarification on Apply Method that you recommended in your above comment. Once again, thank you.

**scidam** · Jul-14-2019, 09:22 AM

The main issue here, I think, is missing sep=';'. You need to call read_csv with a parameter sep=';', e.g.

data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv', sep=';')

If you do so, you will get a data frame, which has columns named Detergent_Brands, Cold, Hot without any semicolons.
These columns will include values top and alpha (without any semicolumns too), so you won't need to apply any cleaning up at all.
Try pass sep=';' and do df restructure as it is shown in #2.

As far as .apply is concerned, you can read about this method in official docs.

Tese · Jul-14-2019, 06:16 PM

(Jul-14-2019, 09:22 AM)scidam Wrote: The main issue here, I think, is missing sep=';'. You need to call read_csv with a parameter sep=';', e.g.
data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv', sep=';')
If you do so, you will get a data frame, which has columns named Detergent_Brands, Cold, Hot without any semicolons.
These columns will include values top and alpha (without any semicolumns too), so you won't need to apply any cleaning up at all.
Try pass sep=';' and do df restructure as it is shown in #2.

As far as .apply is concerned, you can read about this method in official docs.

Thank you very much for further comments, I will take your solution into consideration. Usually, I practice Python in weekends, I will try it in the coming weekend.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python for Analysis of variance / Anova	xxxlabradorxxx	1	799	Sep-25-2023, 07:11 PM Last Post: jefsummers
	Why is my gradient descent algorithm requiring such a small alpha?	JoeB	1	2,409	Dec-08-2017, 05:15 PM Last Post: JoeB

ANOVA: DataFrame ha no Attribute alpha

User Panel Messages

Announcements