Lab 2: Theory Based Inference for a Single Proportion & for a Single Mean

Goals of this lab.

to use Theory Based Inference for a population proportion to research studies involving a random sample from a finite population.
to use Theory Based Inference for a population mean to research studies involving a random sample from a finite population.

For every lab we will begin by loading the four packages: openintro, rmarkdown, tidyverse, and readr

library(rmarkdown)
library(openintro)
library(tidyverse)
library(readr)

Theory Based Inference for a single proportion. One Proportion \(z\)-test; Normal Approximation.

Example 1.

In the national debate on same-sex marriage, it is commonly stated that 70% of all Americans favor same-sex marriage. In 2017, Pew Research conducted a poll of millennials and found that 79% answered yes when asked: Do you support same-sex marriage? The poll was a random sample of 85 millennials.

Does this poll provide convincing evidence that the support from millennials for same-sex marriage is higher than that of the larger general population of Americans?

Our null hypothesis is that the proportion of millenials that support same-sex marriage is the same as the proportion of all US adults that support same-sex marriage.

The alternative hypothesis is that the proportion of millenials that support same-sex marriage is larger than the proportion of of all US adults that support same-sex marriage.

In notation \[ H_0: \pi = 0.70\] \[ H_a: \pi > 0.70\] We can use the One Proportion app to complete both simulation-based and theory-based inference. Go to the app now and complete the simulation-based p-value. Also, display the Summary Statistics.

We enter our observed proportion, sample size and the assumed proportion under the null hypothesis into R as follows.

#example: marriage equality
obs_prop <- 0.79
n <- 85
null_pi <- 0.70

We start by checking the validity conditions to use the One Proportion \(z\)-test with Normal Approximation. We must have sample size large enough to include at least 10 successes and at least 10 failures. The successes are the number of people that support same-sex marriage – calculated as 79% of the sample size 85 as below.

n*obs_prop

## [1] 67.15

And the number of failures are the sample size minus the number of successes.

n - n*obs_prop

## [1] 17.85

Both are greater than 10 so our validity conditions are met.

Next we calculate the standard deviation of our null distribution using the formula

\[\sqrt{ \frac{\pi(1-\pi)}{n}}\]

SD <- sqrt(null_pi*(1 - null_pi)/n)
SD

## [1] 0.04970501

Check this with the standard deviation of the simulated null distribution that you found when using the One Proportion applet.

Next we will calculate the standardized statistic, \(z\), using the formula \[ z = \frac{\textrm{statistic} - \textrm{mean of the null distribution}}{\textrm{standard deviation of the null distribution}}\]

z_stat <- (obs_prop - null_pi)/SD
z_stat

## [1] 1.810683

The final step is to calculate the theory-based p-value using our standardized statistic and the normal approximation function pnorm( ).

The argument lower.tail = FALSE is used to calculate the area in the upper tail of the distribution as we measure in the direction of the alternative hypothesis, \[ H_a: \pi > 0.50.\]

To calculate a p-value for an alternative hypothesis of the form \[ H_a: \pi < \textrm{some number} \] we would use lower.tail = TRUE. Remember that capitalization matters in R; typing lower.tail = false will result in an error message.

SSp_value <- pnorm(z_stat, lower.tail = FALSE)
SSp_value

## [1] 0.035095

Go back to the applet and select the Normal Approximation option. Check the value of the standardized statistic z and the p-value from the pnorm( ) function. What do you notice? Compare the standard deviation of the null distribution with our SD value. What do you notice? You should seeing the values are the same up to rounding differences.

Theory Based Inference for a single mean. One sample T-test.

Example 2.

The mean length of all cell phone conversations in the United States is reported to be 2.27 minutes. One researcher believes this value is outdated and that the true mean time spent on a cell phone calls is something other than 2.27 minutes. They collect a random sample of 100 phone call lengths from a phone company’s records and find a sample mean of 123 seconds and a standard deviation of 48 seconds. The data was seen to be only moderately skewed to the left.

Our hypotheses are \[H_0: \mu = 2.27\] \[H_a: \mu \neq 2.27.\]

# example: phone call length (in minutes)
mu <- 2.27
xbar <- 123/60
# why are we dividing by 60?  look at the units!
n <- 100
s <- 48/60

Validity conditions for a One-sample t-test.
The quantitative variable should have a symmetric distribution, or you should have at least 20 observations and the sample distribution should not be strongly skewed. These conditions are met in this example.

Next we calculate the t-statistic.

# calcualte the test statistic
t_stat <- (xbar-mu)/(s/sqrt(n))
t_stat

## [1] -2.75

Using the t-statistic, sample size and alternative hypothesis, we calculate the p-value using the Student t-distribution function pt( ). The dependence of the t-distribution on the sample size is described by saying that the t-distribution has n-1 degrees of freedom, or df = n-1.

Notice that in this example we have a negative \(t\)-statistic and a two-sided alternative hypothesis. So either we use the negative \(t\)-statistic and double the lower.tail=TRUE value, or we take the absolute value of the \(t\)-statistic and use the lower.tail=FALSE option to calculate the area in the upper tail and double that p-value. The code for these two options are below. Notice that they both evaluate to the same number.

# two-sided alternative hypothesis
PC1_p_value <- pt(t_stat, df=n-1, lower.tail=TRUE)*2
PC1_p_value

## [1] 0.007085826

# two-sided alternative hypothesis
PC2_p_value <- pt(abs(t_stat), df=n-1, lower.tail=FALSE)*2
PC2_p_value

## [1] 0.007085826

Exercises Group 1.

Consider the scenario from Section 1.2, we looked at the rock-paper-scissors game with a novice player that threw scissors 4 out of the 20 rounds. Will the theory-based approach (more formally called the the one proportion \(z\)-test; normal approximation) work here? Explain.
Go to the One Proportion applet and run both the simulation and theory-based approaches for this rock-paper-scissors scenario. What do you notice?

Exercises Group 2.

This example investigates eye-dominance. To figure out which of your eyes is dominant, carry out the following test:

Extend both of your arms in front of you and create a triangular opening with your thumbs and pointer fingers.
With both eyes, center your triangular opening on a distant object like a clock or lamp.
Close your right eye. If the object stays centered in the triangular opening, your open left eye is your dominant eye. If the object is no longer in the triangular opening then your right eye is your dominant eye.
Double check this by closing your left eye. If the object stays centered then your right eye is your dominant eye.

Record in your lab report whether you have left or right eye dominance.

As we saw in Lab 1, we can use the command read_table( ) to import the EyeDominance data from the following address “https://willamette.edu/~ijohnson/138/EyeDominance.txt”. This data contains responses to the question ‘Are you right-eye dominant?’ for a random sample of students at a large university.

#This code should work to import the data
EyeDominance <- read_csv("https://raw.githubusercontent.com/IJohnson-math/Math138/main/EyeDominance.txt",  col_types="c")

Write commands to view the head and tail of the data in the two R-chunks below.

Conventional wisdom says that more people are right-handed than left, so for now let’s have our research hypothesis be that more often people tend to be right-eye dominant. Our null hypothesis is that left and right eye dominance are equally prevalent; in other words the proportion of right eye dominance is 0.50. Our alternative hypothesis is that right-eye dominance is more prevalent than left, or that the proportion of right eye dominance is more prevalent than 0.50.

Using notation, \(H_0: \pi = 0.50\) and \(H_a: \pi > 0.50\).

Create a table of values that counts the number of Y’s and N’s for the RightDominance variable. (Hint: refer to the commands from Lab 1.). Name your table table_eye

#table_eye <-

Record the observed proportion, sample size and the assumed proportion under the null hypothesis in R.

#example: eye dominance
#obs_prop <- 
#n <- 
#null_pi <-

Use R as a calculator to checking the conditions to use the one proportion \(z\)-test. Describe what the numbers mean in the space below the R chunk.

#look at the code above and mimic it here

Description for 7)

Calculate the standard deviation of our null distribution.
Next calculate the standardized statistic \(z\).
Last, calculate the theory-based p-value and write out a conclusion that includes the research question, the p-value, and the significance of the p-value.

Conclusion for 10).

Exercises Group 3.

Although it is known that the white shark grows to a mean length of 21 feet, a marine biologist believes that the great white sharks off the Bermuda coast grow much longer due to unusual feeding habits. To test this claim, a number of full-grown great white sharks are captured off the Bermuda coast, measured and then set free. For the 15 sharks that were caught, a sample mean of 22.1 feet and sample standard deviation of 3.2 feet was found. A histogram of the sample data appeared to be approximately normal. Perform a test of significance using a significance level of 0.05.