##### ANOVA: DataFrame ha no Attribute alpha
 ANOVA: DataFrame ha no Attribute alpha Tese Unladen Swallow Posts: 3 Threads: 1 Joined: Jul 2019 Reputation: Jul-13-2019, 09:01 PM (This post was last modified: Jul-14-2019, 12:36 AM by scidam.) I'm a beginner in Python, but familiar with statistics. I'm trying to conduct Two Way ANOVA analysis in the use of Python examples from blogs. Specifically, I want to test two detergent brands, called Top and Alpha, in the use of cold and hot water. In the end, I can determine water temperature has in an effect on the effectiveness of detergents. But, unfortunately, I faced with errors in the beginning on in interaction plot construction and on the degree of freedom calculation as indicated below. I have enclosed below details, hypothetical data, codes, and errors. Please help me! ```import pandas as pd from statsmodels.formula.api import ols from statsmodels.stats.anova import anova_lm from statsmodels.graphics.factorplots import interaction_plot import matplotlib.pyplot as plt from scipy import stats data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv') print(data)`````````Output: "Detergent_Brands"; "Cold"; "Hot" 0 "top"; 4; 10 1 "alpha"; 5; 10 2 "top"; 6; 12 3 "alpha"; 5; 13 4 "top"; 6; 11 5 "alpha"; 5; 10 6 "top"; 4; 12 7 "alpha"; 6; 11 8 "top"; 4; 12 9 "alpha"; 5; 10 10 "top"; 6; 11 11 "alpha"; 5; 12 12 "top"; 6; 10 13 "alpha"; 5; 10`````````N = len(data.len) df_a = len(data.top.unique()) - 1 df_b = len(data.alpha.unique()) - 1 df_axb = df_a*df_b df_w = N - (len(data.top.unique())*len(data.alpha.unique()))`````````Error:--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 N = len(data.len) 2 df_a = len(data.top.unique()) - 1 3 df_b = len(data.alpha.unique()) - 1 4 df_axb = df_a*df_b 5 df_w = N - (len(data.top.unique())*len(data.alpha.unique())) ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'len'`````````fig = interaction_plot(data.alpha, data.top, data.len, colors=['red','blue'], markers=['D','^'], ms=11)`````````Error:--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 fig = interaction_plot(data.alpha, data.top, data.len, 2 colors=['red','blue'], markers=['D','^'], ms=11) ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'alpha'`````` Reply Posts: 818 Threads: 1 Joined: Mar 2018 Reputation: Jul-14-2019, 01:11 AM (This post was last modified: Jul-14-2019, 01:12 AM by scidam.) You definitely need to clean up your data and restructure. remove semicolons and quotes from df; (you can use `.apply` method to do that) Do something like this before you start any analysis: `df = pd.concat([df, df['Detergent_Brands'].str.get_dummies()], axis=1).drop(['Detergent_Brands'], axis=1)`. Instead of accessing to data-frame columns as attributes (e.g. `df.top` etc), consider using `df['top']`, `df['alpha']`. This is approach is more robust, especially in cases when column names collide with data-frame internal methods. You've got the error because your df hasn't a column named `alpha` (it hasn't a column named `top` too). `.get_dummies` should create these columns (but you need to clean up your data first). I think, your data have not been properly parsed (loaded). Consider passing a separator, e.g. `sep=';'` to the `read_csv` function. Reply Tese Unladen Swallow Posts: 3 Threads: 1 Joined: Jul 2019 Reputation: Jul-14-2019, 03:49 AM (Jul-14-2019, 01:11 AM)scidam Wrote: You definitely need to clean up your data and restructure. remove semicolons and quotes from df; (you can use `.apply` method to do that) Do something like this before you start any analysis: `df = pd.concat([df, df['Detergent_Brands'].str.get_dummies()], axis=1).drop(['Detergent_Brands'], axis=1)`. Instead of accessing to data-frame columns as attributes (e.g. `df.top` etc), consider using `df['top']`, `df['alpha']`. This is approach is more robust, especially in cases when column names collide with data-frame internal methods. You've got the error because your df hasn't a column named `alpha` (it hasn't a column named `top` too). `.get_dummies` should create these columns (but you need to clean up your data first). I think, your data have not been properly parsed (loaded). Consider passing a separator, e.g. `sep=';'` to the `read_csv` function. Dear Scidam, thank you very much for respond. I will try your instruction, if possible can you give some clarification on Apply Method that you recommended in your above comment. Once again, thank you. Reply Posts: 818 Threads: 1 Joined: Mar 2018 Reputation: Jul-14-2019, 09:22 AM The main issue here, I think, is missing `sep=';'`. You need to call `read_csv` with a parameter `sep=';'`, e.g. `data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv', sep=';')`If you do so, you will get a data frame, which has columns named `Detergent_Brands`, `Cold`, `Hot` without any semicolons. These columns will include values `top` and `alpha` (without any semicolumns too), so you won't need to apply any cleaning up at all. Try pass `sep=';'` and do df restructure as it is shown in #2. As far as `.apply` is concerned, you can read about this method in official docs. Reply Tese Unladen Swallow Posts: 3 Threads: 1 Joined: Jul 2019 Reputation: Jul-14-2019, 06:16 PM (Jul-14-2019, 09:22 AM)scidam Wrote: The main issue here, I think, is missing `sep=';'`. You need to call `read_csv` with a parameter `sep=';'`, e.g. `data = pd.read_csv('C:\\Users\\Tesema\\Desktop\\PYTHON\\PYTHON3\\Deterent2.csv', sep=';')`If you do so, you will get a data frame, which has columns named `Detergent_Brands`, `Cold`, `Hot` without any semicolons. These columns will include values `top` and `alpha` (without any semicolumns too), so you won't need to apply any cleaning up at all. Try pass `sep=';'` and do df restructure as it is shown in #2. As far as `.apply` is concerned, you can read about this method in official docs. Thank you very much for further comments, I will take your solution into consideration. Usually, I practice Python in weekends, I will try it in the coming weekend. Reply

 Possibly Related Threads… Thread Author Replies Views Last Post Why is my gradient descent algorithm requiring such a small alpha? JoeB 1 1,310 Dec-08-2017, 05:15 PM Last Post: JoeB

Forum Jump:

### User Panel Messages

##### Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020