Python Forum

Full Version: ANOVA: DataFrame ha no Attribute alpha
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm a beginner in Python, but familiar with statistics. I'm trying to conduct Two Way ANOVA analysis in the use of Python examples from blogs. Specifically, I want to test two detergent brands, called Top and Alpha, in the use of cold and hot water. In the end, I can determine water temperature has in an effect on the effectiveness of detergents.
But, unfortunately, I faced with errors in the beginning on in interaction plot construction and on the degree of freedom calculation as indicated below. I have enclosed below details, hypothetical data, codes, and errors. Please help me!

import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from statsmodels.graphics.factorplots import interaction_plot
import matplotlib.pyplot as plt
from scipy import stats

data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv')
print(data)
Output:
"Detergent_Brands"; "Cold"; "Hot" 0 "top"; 4; 10 1 "alpha"; 5; 10 2 "top"; 6; 12 3 "alpha"; 5; 13 4 "top"; 6; 11 5 "alpha"; 5; 10 6 "top"; 4; 12 7 "alpha"; 6; 11 8 "top"; 4; 12 9 "alpha"; 5; 10 10 "top"; 6; 11 11 "alpha"; 5; 12 12 "top"; 6; 10 13 "alpha"; 5; 10
N = len(data.len)

df_a = len(data.top.unique()) - 1

df_b = len(data.alpha.unique()) - 1

df_axb = df_a*df_b 

df_w = N - (len(data.top.unique())*len(data.alpha.unique()))
Error:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-7-b9e8e4474b44> in <module> ----> 1 N = len(data.len) 2 df_a = len(data.top.unique()) - 1 3 df_b = len(data.alpha.unique()) - 1 4 df_axb = df_a*df_b 5 df_w = N - (len(data.top.unique())*len(data.alpha.unique())) ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'len'
fig = interaction_plot(data.alpha, data.top, data.len,

             colors=['red','blue'], markers=['D','^'], ms=11)
Error:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-9-6c5947e09274> in <module> ----> 1 fig = interaction_plot(data.alpha, data.top, data.len, 2 colors=['red','blue'], markers=['D','^'], ms=11) ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'alpha'
You definitely need to clean up your data and restructure.

  1. remove semicolons and quotes from df; (you can use .apply method to do that)
  2. Do something like this before you start any analysis: df = pd.concat([df, df['Detergent_Brands'].str.get_dummies()], axis=1).drop(['Detergent_Brands'], axis=1).

Instead of accessing to data-frame columns as attributes (e.g. df.top etc), consider using df['top'], df['alpha']. This is approach is more robust, especially in cases when column names collide with data-frame internal methods.

You've got the error because your df hasn't a column named alpha (it hasn't a column named top too). .get_dummies should create these columns (but you need to clean up your data first).

I think, your data have not been properly parsed (loaded). Consider passing a separator, e.g. sep=';' to the read_csv function.
(Jul-14-2019, 01:11 AM)scidam Wrote: [ -> ]You definitely need to clean up your data and restructure.

  1. remove semicolons and quotes from df; (you can use .apply method to do that)
  2. Do something like this before you start any analysis: df = pd.concat([df, df['Detergent_Brands'].str.get_dummies()], axis=1).drop(['Detergent_Brands'], axis=1).

Instead of accessing to data-frame columns as attributes (e.g. df.top etc), consider using df['top'], df['alpha']. This is approach is more robust, especially in cases when column names collide with data-frame internal methods.

You've got the error because your df hasn't a column named alpha (it hasn't a column named top too). .get_dummies should create these columns (but you need to clean up your data first).

I think, your data have not been properly parsed (loaded). Consider passing a separator, e.g. sep=';' to the read_csv function.

Dear Scidam, thank you very much for respond. I will try your instruction, if possible can you give some clarification on Apply Method that you recommended in your above comment. Once again, thank you.
The main issue here, I think, is missing sep=';'. You need to call read_csv with a parameter sep=';', e.g.

data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv', sep=';')
If you do so, you will get a data frame, which has columns named Detergent_Brands, Cold, Hot without any semicolons.
These columns will include values top and alpha (without any semicolumns too), so you won't need to apply any cleaning up at all.
Try pass sep=';' and do df restructure as it is shown in #2.

As far as .apply is concerned, you can read about this method in official docs.
(Jul-14-2019, 09:22 AM)scidam Wrote: [ -> ]The main issue here, I think, is missing sep=';'. You need to call read_csv with a parameter sep=';', e.g.

data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv', sep=';')
If you do so, you will get a data frame, which has columns named Detergent_Brands, Cold, Hot without any semicolons.
These columns will include values top and alpha (without any semicolumns too), so you won't need to apply any cleaning up at all.
Try pass sep=';' and do df restructure as it is shown in #2.

As far as .apply is concerned, you can read about this method in official docs.

Thank you very much for further comments, I will take your solution into consideration. Usually, I practice Python in weekends, I will try it in the coming weekend.