Python Forum
Chi-square with the contingency table applied to the die
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Chi-square with the contingency table applied to the die
#1
Bonjour,

I am trying to apply the χ2 test by contingency table from the exploratory statistics course to the results given in Example 1 from Wikipedia.

Site_de_Wikipedia

Rolling a die 600 times in a row gave the following results:
number rolled 1 2 3 4 5 6
numbers 88 109 107 94 105 97
The number of degrees of freedom is 6 - 1 = 5.
We wish to test the hypothesis that the die is not rigged, with a risk α = 0.05.
The null hypothesis here is therefore: "The die is balanced".
Considering this hypothesis to be true, the variable T defined above is : ( 88 - 100 ) 2 100 + ( 109 - 100 ) 2 100 + ( 107 - 100 ) 2 100 + ( 94 - 100 ) 2 100 + ( 105 - 100 ) 2 100 + ( 97 - 100 ) 2 100 = 3 , 44
The χ2 distribution with five degrees of freedom gives the value below which we consider the draw to be compliant with a risk α = 0.05: P(T < 11.07) = 0.95.
Since 3.44 < 11.07, we cannot reject the null hypothesis: this statistical data does not allow us to consider that the die is rigged.

I tried to retrieve this result with pandas:

dico = {' face ' : [1,2,3,4,5,6], ' numbers ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' face '],tab[' numbers '])
print(ta)
test = chi2_contingency(tab)
test

face numbers 0 1 88 1 2 109 2 3 107 3 4 94 4 5 105 5 6 97 numbers 88 94 97 105 107 109 face 1 1 0 0 0 0 0 0 2 0 0 0 0 0 1 3 0 0 0 0 1 0 4 0 1 0 0 0 0 5 0 0 0 1 0 0 6 0 0 1 0 0 0

(4.86, 0.432, 5)

This is not the expected result. (with ta, it is the same)

then I present the problem as follows:
dico = {' error ' : [-12, 9, 7, -6, 5, -3], ' number ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' error '],tab[' staff '])
test

error numbers 0 -12 88 1 9 109 2 7 107 3 -6 94 4 5 105 5 -3 97

(10.94, 0.052, 5)...same...I expect something like (3.44, p-value should be between 0.5 and 0.9, 5)

Something is wrong but What?

Regards,
Leloup
Larz60+ write Mar-07-2022, 01:53 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Attached Files

Thumbnail(s)
   
Reply
#2
Bonjour,

I am trying to apply the χ2 test by contingency table from the exploratory statistics course to the results given in Example 1 from Wikipedia.

Site_de_Wikipedia

Rolling a die 600 times in a row gave the following results:
number rolled 1 2 3 4 5 6
numbers 88 109 107 94 105 97
The number of degrees of freedom is 6 - 1 = 5.
We wish to test the hypothesis that the die is not rigged, with a risk α = 0.05.
The null hypothesis here is therefore: "The die is balanced".
Considering this hypothesis to be true, the variable T defined above is : ( 88 - 100 ) 2 100 + ( 109 - 100 ) 2 100 + ( 107 - 100 ) 2 100 + ( 94 - 100 ) 2 100 + ( 105 - 100 ) 2 100 + ( 97 - 100 ) 2 100 = 3 , 44
The χ2 distribution with five degrees of freedom gives the value below which we consider the draw to be compliant with a risk α = 0.05: P(T < 11.07) = 0.95.
Since 3.44 < 11.07, we cannot reject the null hypothesis: this statistical data does not allow us to consider that the die is rigged.

I tried to retrieve this result with pandas:

dico = {' face ' : [1,2,3,4,5,6], ' numbers ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' face '],tab[' numbers '])
print(ta)
test = chi2_contingency(tab)
test
face numbers 0 1 88 1 2 109 2 3 107 3 4 94 4 5 105 5 6 97 numbers 88 94 97 105 107 109 face 1 1 0 0 0 0 0 0 2 0 0 0 0 0 1 3 0 0 0 0 1 0 4 0 1 0 0 0 0 5 0 0 0 1 0 0 6 0 0 1 0 0 0
Error:
(4.86, 0.432, 5)
This is not the expected result. (with ta, it is the same)

then I present the problem as follows:
dico = {' error ' : [-12, 9, 7, -6, 5, -3], ' number ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' error '],tab[' staff '])
test
Error:
numbers 0 -12 88 1 9 109 2 7 107 3 -6 94 4 5 105 5 -3 97 (10.94, 0.052, 5)...same...I expect something like (3.44, p-value should be between 0.5 and 0.9, 5)
Something is wrong but What?

Regards,
Leloup
Reply
#3
Please show all code needed to run.
Reply
#4
(Mar-10-2022, 08:50 PM)Larz60+ Wrote: Please show all code needed to run.

Quote: All the code is there.
Knowing the result of Chi2 (3.44) in the case of the test of a balanced die, I try to find this result using a contingency table or not with the function chi2_contengency. Perhaps, the problem should be presented differently, perhaps, columns should be added but which ones. I don't know.
Reply
#5
I use pd.crosstab() on the actual data, rather than the sums of the data. But, in any case a p value of 0.052 does not appear unreasonable given the disparities (88,109). Remember that the p value is the probability that you would have gotten the results you did should the null hypothesis be true, and a p of less than 0.05 is considered statistically significant
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020