Python Forum

Bonjour,

I am trying to apply the χ2 test by contingency table from the exploratory statistics course to the results given in Example 1 from Wikipedia.

Site_de_Wikipedia

Rolling a die 600 times in a row gave the following results:
number rolled 1 2 3 4 5 6
numbers 88 109 107 94 105 97
The number of degrees of freedom is 6 - 1 = 5.
We wish to test the hypothesis that the die is not rigged, with a risk α = 0.05.
The null hypothesis here is therefore: "The die is balanced".
Considering this hypothesis to be true, the variable T defined above is : ( 88 - 100 ) 2 100 + ( 109 - 100 ) 2 100 + ( 107 - 100 ) 2 100 + ( 94 - 100 ) 2 100 + ( 105 - 100 ) 2 100 + ( 97 - 100 ) 2 100 = 3 , 44
The χ2 distribution with five degrees of freedom gives the value below which we consider the draw to be compliant with a risk α = 0.05: P(T < 11.07) = 0.95.
Since 3.44 < 11.07, we cannot reject the null hypothesis: this statistical data does not allow us to consider that the die is rigged.

I tried to retrieve this result with pandas:

dico = {' face ' : [1,2,3,4,5,6], ' numbers ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' face '],tab[' numbers '])
print(ta)
test = chi2_contingency(tab)
test

face numbers 0 1 88 1 2 109 2 3 107 3 4 94 4 5 105 5 6 97 numbers 88 94 97 105 107 109 face 1 1 0 0 0 0 0 0 2 0 0 0 0 0 1 3 0 0 0 0 1 0 4 0 1 0 0 0 0 5 0 0 0 1 0 0 6 0 0 1 0 0 0

(4.86, 0.432, 5)

This is not the expected result. (with ta, it is the same)

then I present the problem as follows:
dico = {' error ' : [-12, 9, 7, -6, 5, -3], ' number ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' error '],tab[' staff '])
test

error numbers 0 -12 88 1 9 109 2 7 107 3 -6 94 4 5 105 5 -3 97

(10.94, 0.052, 5)...same...I expect something like (3.44, p-value should be between 0.5 and 0.9, 5)

Something is wrong but What?

Regards,
Leloup

Bonjour,

I am trying to apply the χ2 test by contingency table from the exploratory statistics course to the results given in Example 1 from Wikipedia.

Site_de_Wikipedia

Rolling a die 600 times in a row gave the following results:
number rolled 1 2 3 4 5 6
numbers 88 109 107 94 105 97
The number of degrees of freedom is 6 - 1 = 5.
We wish to test the hypothesis that the die is not rigged, with a risk α = 0.05.
The null hypothesis here is therefore: "The die is balanced".
Considering this hypothesis to be true, the variable T defined above is : ( 88 - 100 ) 2 100 + ( 109 - 100 ) 2 100 + ( 107 - 100 ) 2 100 + ( 94 - 100 ) 2 100 + ( 105 - 100 ) 2 100 + ( 97 - 100 ) 2 100 = 3 , 44
The χ2 distribution with five degrees of freedom gives the value below which we consider the draw to be compliant with a risk α = 0.05: P(T < 11.07) = 0.95.
Since 3.44 < 11.07, we cannot reject the null hypothesis: this statistical data does not allow us to consider that the die is rigged.

I tried to retrieve this result with pandas:

dico = {' face ' : [1,2,3,4,5,6], ' numbers ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' face '],tab[' numbers '])
print(ta)
test = chi2_contingency(tab)
test

face numbers 0 1 88 1 2 109 2 3 107 3 4 94 4 5 105 5 6 97 numbers 88 94 97 105 107 109 face 1 1 0 0 0 0 0 0 2 0 0 0 0 0 1 3 0 0 0 0 1 0 4 0 1 0 0 0 0 5 0 0 0 1 0 0 6 0 0 1 0 0 0

Error:
(4.86, 0.432, 5)

This is not the expected result. (with ta, it is the same)

then I present the problem as follows:

dico = {' error ' : [-12, 9, 7, -6, 5, -3], ' number ' : [88, 109, 107, 94, 105, 97]} #[100, 100, 101, 99, 101, 99]}
tab = pd.DataFrame(dico)
print(tab.head(6))
ta = pd.crosstab(tab[' error '],tab[' staff '])
test

Error: numbers 0 -12 88 1 9 109 2 7 107 3 -6 94 4 5 105 5 -3 97

(10.94, 0.052, 5)...same...I expect something like (3.44, p-value should be between 0.5 and 0.9, 5)

Something is wrong but What?

Regards,
Leloup

Please show all code needed to run.

(Mar-10-2022, 08:50 PM)Larz60+ Wrote: [ -> ]Please show all code needed to run.

Quote: All the code is there.
Knowing the result of Chi2 (3.44) in the case of the test of a balanced die, I try to find this result using a contingency table or not with the function chi2_contengency. Perhaps, the problem should be presented differently, perhaps, columns should be added but which ones. I don't know.

I use pd.crosstab() on the actual data, rather than the sums of the data. But, in any case a p value of 0.052 does not appear unreasonable given the disparities (88,109). Remember that the p value is the probability that you would have gotten the results you did should the null hypothesis be true, and a p of less than 0.05 is considered statistically significant

Leloup

Leloup

Larz60+

Leloup

jefsummers