to use Theory Based Inference for a population proportion to research studies involving a random sample from a finite population.
to use Theory Based Inference for a population mean to research studies involving a random sample from a finite population.
For every lab we will begin by loading the four packages: openintro, rmarkdown, tidyverse, and readr
library(rmarkdown)
library(openintro)
library(tidyverse)
library(readr)
In the national debate on same-sex marriage, it is commonly stated that 70% of all Americans favor same-sex marriage. In 2017, Pew Research conducted a poll of millennials and found that 79% answered yes when asked: Do you support same-sex marriage? The poll was a random sample of 85 millennials.
Does this poll provide convincing evidence that the support from millennials for same-sex marriage is higher than that of the larger general population of Americans?
Our null hypothesis is that the proportion of millenials that support same-sex marriage is the same as the proportion of all US adults that support same-sex marriage.
The alternative hypothesis is that the proportion of millenials that support same-sex marriage is larger than the proportion of of all US adults that support same-sex marriage.
In notation \[ H_0: \pi = 0.70\] \[ H_a: \pi > 0.70\] We can use the One Proportion app to complete both simulation-based and theory-based inference. Go to the app now and complete the simulation-based p-value. Also, display the Summary Statistics.
We enter our observed proportion, sample size and the assumed proportion under the null hypothesis into R as follows.
#example: marriage equality
<- 0.79
obs_prop <- 85
n <- 0.70 null_pi
We start by checking the validity conditions to use the One Proportion \(z\)-test with Normal Approximation. We must have sample size large enough to include at least 10 successes and at least 10 failures. The successes are the number of people that support same-sex marriage – calculated as 79% of the sample size 85 as below.
*obs_prop n
## [1] 67.15
And the number of failures are the sample size minus the number of successes.
- n*obs_prop n
## [1] 17.85
Both are greater than 10 so our validity conditions are met.
Next we calculate the standard deviation of our null distribution using the formula
\[\sqrt{ \frac{\pi(1-\pi)}{n}}\]
<- sqrt(null_pi*(1 - null_pi)/n)
SD SD
## [1] 0.04970501
Check this with the standard deviation of the simulated null distribution that you found when using the One Proportion applet.
Next we will calculate the standardized statistic, \(z\), using the formula \[ z = \frac{\textrm{statistic} - \textrm{mean of the null distribution}}{\textrm{standard deviation of the null distribution}}\]
<- (obs_prop - null_pi)/SD
z_stat z_stat
## [1] 1.810683
The final step is to calculate the theory-based p-value
using our standardized statistic and the normal approximation function
pnorm( )
.
The argument lower.tail = FALSE
is used to calculate the
area in the upper tail of the distribution as we measure in the
direction of the alternative hypothesis, \[
H_a: \pi > 0.50.\]
To calculate a p-value for an alternative hypothesis of the form
\[ H_a: \pi < \textrm{some number}
\] we would use lower.tail = TRUE
. Remember that
capitalization matters in R; typing lower.tail = false
will
result in an error message.
<- pnorm(z_stat, lower.tail = FALSE)
SSp_value SSp_value
## [1] 0.035095
Go back to the applet and select the Normal
Approximation option. Check the value of the standardized
statistic z and the p-value from the
pnorm( )
function. What do you notice? Compare the standard
deviation of the null distribution with our SD value. What do you
notice? You should seeing the values are the same up to rounding
differences.
The mean length of all cell phone conversations in the United States is reported to be 2.27 minutes. One researcher believes this value is outdated and that the true mean time spent on a cell phone calls is something other than 2.27 minutes. They collect a random sample of 100 phone call lengths from a phone company’s records and find a sample mean of 123 seconds and a standard deviation of 48 seconds. The data was seen to be only moderately skewed to the left.
Our hypotheses are \[H_0: \mu = 2.27\] \[H_a: \mu \neq 2.27.\]
# example: phone call length (in minutes)
<- 2.27
mu <- 123/60
xbar # why are we dividing by 60? look at the units!
<- 100
n <- 48/60 s
Validity conditions for a One-sample t-test.
The quantitative variable should have a symmetric distribution, or you
should have at least 20 observations and the sample distribution should
not be strongly skewed. These conditions are met in this example.
Next we calculate the t-statistic.
# calcualte the test statistic
<- (xbar-mu)/(s/sqrt(n))
t_stat t_stat
## [1] -2.75
Using the t-statistic, sample size and alternative
hypothesis, we calculate the p-value using the Student
t-distribution function pt( )
. The dependence of
the t-distribution on the sample size is described by saying
that the t-distribution has n-1 degrees of freedom, or
df = n-1.
Notice that in this example we have a negative \(t\)-statistic and a two-sided alternative
hypothesis. So either we use the negative \(t\)-statistic and double the
lower.tail=TRUE
value, or we take the absolute value of the
\(t\)-statistic and use the
lower.tail=FALSE
option to calculate the area in the upper
tail and double that p-value. The code for these two options are below.
Notice that they both evaluate to the same number.
# two-sided alternative hypothesis
<- pt(t_stat, df=n-1, lower.tail=TRUE)*2
PC1_p_value PC1_p_value
## [1] 0.007085826
# two-sided alternative hypothesis
<- pt(abs(t_stat), df=n-1, lower.tail=FALSE)*2
PC2_p_value PC2_p_value
## [1] 0.007085826
Consider the scenario from Section 1.2, we looked at the rock-paper-scissors game with a novice player that threw scissors 4 out of the 20 rounds. Will the theory-based approach (more formally called the the one proportion \(z\)-test; normal approximation) work here? Explain.
Go to the One Proportion applet and run both the simulation and theory-based approaches for this rock-paper-scissors scenario. What do you notice?
This example investigates eye-dominance. To figure out which of your eyes is dominant, carry out the following test:
Extend both of your arms in front of you and create a triangular opening with your thumbs and pointer fingers.
With both eyes, center your triangular opening on a distant object like a clock or lamp.
Close your right eye. If the object stays centered in the triangular opening, your open left eye is your dominant eye. If the object is no longer in the triangular opening then your right eye is your dominant eye.
Double check this by closing your left eye. If the object stays centered then your right eye is your dominant eye.
Record in your lab report whether you have left or right eye dominance.
read_table( )
to import the
EyeDominance
data from the following address “https://willamette.edu/~ijohnson/138/EyeDominance.txt”.
This data contains responses to the question ‘Are you right-eye
dominant?’ for a random sample of students at a large university.#This code should work to import the data
<- read_csv("https://raw.githubusercontent.com/IJohnson-math/Math138/main/EyeDominance.txt", col_types="c") EyeDominance
Conventional wisdom says that more people are right-handed than left, so for now let’s have our research hypothesis be that more often people tend to be right-eye dominant. Our null hypothesis is that left and right eye dominance are equally prevalent; in other words the proportion of right eye dominance is 0.50. Our alternative hypothesis is that right-eye dominance is more prevalent than left, or that the proportion of right eye dominance is more prevalent than 0.50.
Using notation, \(H_0: \pi = 0.50\) and \(H_a: \pi > 0.50\).
table_eye
#table_eye <-
#example: eye dominance
#obs_prop <-
#n <-
#null_pi <-
#look at the code above and mimic it here
Description for 7)
Calculate the standard deviation of our null distribution.
Next calculate the standardized statistic \(z\).
Last, calculate the theory-based p-value and write out a conclusion that includes the research question, the p-value, and the significance of the p-value.
Conclusion for 10).