PreLab 5: Chi-Squared, Goodness of Fit

Fair Die roll?

A statistics student wondered whether rolling six-sided dice by pushing them off a 2-inch ledge was a fair way of rolling dice.

In this case, for a die roll to be fair, it means that all six sides are equally likely to occur. In other words, if we were to repeatedly roll the die by pushing it off a 2-inch high ledge, then we would expect each of the 6 numbered faces of the die to appear on top an equal number of times in the long run. If we rolled the die 120 times, we would expect to see each of the 6 different numbers rolled about 20 times. After rolling the die 120 times, if we observe our data to have deviated substantially from what we expected to see from a fair die (~20 times each), we may have evidence that our rolling method is not “fair.”

\[H_0: \pi_1 = \pi_2 = \pi_3 = \pi_4 = \pi_5 = \pi_6 = \frac{1}{6}, \textrm{so the method of rolling is fair. }\] \[H_a: \textrm{ At least one face of the die is likely to appear more than 1/6 of the time, so the method of rolling is not fair.}\] where \(\pi_i\) is the long-run proportion of rolling the number \(i\), for \(i =1, 2, 3, 4, 5, 6.\)

Notice how these hypotheses looks familiar — testing the equality of several probabilities — but a key difference is that we are now specifying a specific numerical value for each \(\pi_i\) (and those values must sum to 1).

Load packages and data

#add the other package that we will need
library(ggformula)
library(mosaic)

We’ll load the data, Die.csv, available at this Url: https://raw.githubusercontent.com/IJohnson-math/Math138/main/Die.csv. Since this is a csv fil we’ll use the read.csv() function to read in the data.

DieData <- read.csv("https://raw.githubusercontent.com/IJohnson-math/Math138/main/Die.csv")

We can view the data by creating a bar graph of the frequency each number is rolled.

gf_bar(~DieRoll, data=DieData, xlab="Number", ylab="Frequency of number", title = "Number of times each face appeared in the sample")

We can also view the data by creating a table of counts for the six faces of the die.

tally( ~DieRoll, data=DieData)

## DieRoll
##  1  2  3  4  5  6 
## 19 10 31 17 15 28

The MAD Statistic and Simulation-Based Approach.

The Mean Absolute Difference, MAD, is a statistic measuring how far the the observed counts are away from the hypothesized count (expected count under the null). In this study the hypothesized count is 20. The Absolute Difference part of the calculation is the absolute value of the difference between each observed count and the expected count. Then we take the Mean of those differences. Here is an example for our study.

MAD = (abs(19-20)+abs(10-20)+abs(31-20)+abs(17-20)+abs(15-20)+abs(28-20))/6
MAD

## [1] 6.333333

Interpretation: Our observed frequencies of the numbers rolled are on average 6.33 units away from what is expected.

Go to the Multiple Proportions simulation-based applet to complete the analysis using the MAD statistic.

p-value from Simulation using MAD statistic: 0.0046 (46/10000)

p-value from Simulation using \(\chi^2\) statistic: 0.0061 (61/10000)

The Chi-squared distribution is actually a family of distributions that changes shape according to a variable called degrees of freedom denoted by k.

For a Chi-squared hypothesis test of multiple proportions (as seen in Chapter 8, Section 2) the degrees of freedom are computed by multiplying the number of categories in the explanatory variable minus 1 by the number of categories in the response variable minus 1.
For a Chi-squared Goodness of Fit test (as seen here and in Chapter 8, Section 3) the degrees of freedom are one less than the number of proportions in the model. In our study of whether pushing the die off a ledge is a fair roll we have six possible categories for the roll, so k=6-1=5.

Chi-square distributions

Calculation of the Chi-square statistic

\[ \chi^2 = \Sigma \frac{(\textrm{observed count } - \textrm{ expected count})^2}{\textrm{expected count}}\]

tally(~DieRoll, data=DieData)

## DieRoll
##  1  2  3  4  5  6 
## 19 10 31 17 15 28

#expected count value is (1/6)120 = 20 
chiSquare <- ((19-20)^2)/20 + ((10-20)^2)/20 + ((31-20)^2)/20 +((17-20)^2)/20 + ((15-20)^2)/20 + ((28-20)^2)/20
chiSquare

## [1] 16

Validity Conditions for Theory-Based Chi-squared tests

As in our other theory-based tests, this one comes with validity conditions as well. The validity conditions for a chi-square goodness-of-fit test are that all observed counts are at least 10. Since the values 19, 10, 31, 17, 15, 28 are all larger than 10 the validity conditions have been met for this study.

Theory-Based Chi-square Goodness of Fit test

We use the function chisq.test( ) to perform the goodness of fit test. The input of this function is a list of observed counts and a list of predicted (or expected) proportions. Note that the list of expected proportions must add up to 1.

If we have a data file, we can input the list of observed counts using the tally( ) function. If we have only a table of counts, we can input the list of counts like this c( n1, n2, n3, n4, n5, n6) using the numbers n1, n2, etc. from our study.

To input the list of expected proportions we write p = c( p1, p2, p3, p4, p5, p6) where p1 is the expected proportion for the count n1, p2 is the expected proportion for the count n2, and so on. Note that when the proportions are not all equal, then the order in which they are listed matters!

# GoodnessFit
#chi-square using data file and expected proportion
chisq.test(tally(~DieRoll, data = DieData), p = c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6))

## 
##  Chi-squared test for given probabilities
## 
## data:  tally(~DieRoll, data = DieData)
## X-squared = 16, df = 5, p-value = 0.006844

#chi-square from list of counts and expected proportion
chisq.test(c(19, 10, 31, 17, 15, 28), p=c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6))

## 
##  Chi-squared test for given probabilities
## 
## data:  c(19, 10, 31, 17, 15, 28)
## X-squared = 16, df = 5, p-value = 0.006844

Conclusion: Our theory-based p-value 0.0068 is very similar to both of our simulation-based p-values. From this we conclude that we have strong evidence that pushing dice off a 2-inch ledge is not a fair way of “rolling” dice.

The study design is fairly well controlled because the same ledge was used for each trial, the same lineup of dice was used for each trial, and the same person did the pushing of the dice each trial. Thus, it seems reasonable to believe that the method of dice rolling, pushing them off a 2-inch ledge, is the cause of the faces not being rolled equally likely. We weren’t, however, told whether our five dice were a random sample of all dice. It is plausible that they were taken from a generic board game. Then we could argue they are representative of all game board dice of the same size and thus say that this method of dice rolling is not fair for all game board dice of this specific size. We need more information, though, to extend our conclusions beyond the five dice and 2-inch height used. Additionally, we would need the assurance that these five dice weren’t loaded prior to conducting our study!