Math 138, Section 8.2 – class notes

Acupuncture Study

Increasingly, health insurers, federal agencies, medical organizations, and others are insisting on the use of evidence-based medicine in making health care decisions. Evidence-based medicine means only using medical treatments where evidence (measured using statistics, most typically based on randomized experiments) exists that a particular treatment is both effective and better than other available treatments.

Recently, a randomized experiment was conducted exploring the evidence for the effectiveness of acupuncture in the treatment of chronic lower back pain (Haake et al., 2007). Acupuncture involves the insertion of needles into the skin of the patient at specific locations (acupuncture points) around the body to treat a variety of ailments.

In the experiment, there were three treatment groups of interest: (1) verum acupuncture practiced according to traditional Chinese medicine principles; (2) sham acupuncture where needles were inserted into the skin but not at acupuncture points and not very deep (this is often used as a control in experiments involving acupuncture); and (3) traditional (nonacupuncture) therapy consisting of a combination of drugs, physical therapy, and exercise. A total of 1,162 patients were randomly assigned to each of the three treatment groups, yielding 387 patients in Groups 1 and 2 and 388 patients in Group 3.

Explanatory variable: treatment (acupuncture /sham acupuncture/drugs & PT)

Response variable: significant back pain reduction (yes/no)

Data location: http://www.isi-stats.com/isi/data/chap8/Acupuncture.txt

Setup and packages

library(mosaic)
library(ggformula)
# put in the other package that you need here

Loading in data

AcupuncData <- read.table("http://www.isi-stats.com/isi/data/chap8/Acupuncture.txt", header=TRUE)

Display the data

tally(Improvement ~ Acupunture, data=AcupuncData)

##            Acupunture
## Improvement None Real Sham
##      Better  106  184  171
##      Not     282  203  216

tally(Improvement ~ Acupunture, data=AcupuncData, format="proportion")

##            Acupunture
## Improvement      None      Real      Sham
##      Better 0.2731959 0.4754522 0.4418605
##      Not    0.7268041 0.5245478 0.5581395

Proportion of participants with back pain relief:

\(\hat{p}_{none} = 0.2731959\)

\(\hat{p}_{real} = 0.4754522\)

\(\hat{p}_{sham} = 0.4418605\)

#save the success proportions for the three treatment groups: none, real and sham acupuncture
pn=0.2731959
pr=0.4754522
ps=0.4418605
pooled=(106+184+171)/(387+387+387)

A bar graph can be used to visualize the proportion of patients with back pain relief for the three treatment groups.

gf_bar(~Acupunture, fill= ~Improvement, data=AcupuncData, position="fill", reverse=TRUE, color="black", title="Proportion of people with significant back pain relief", xlab="Treatment Groups", ylab="proportion")+ scale_fill_manual(values = c("blue","grey80"))

The Mean Group Difference Statistic

We use the function abs( ) to calculate the absolute value of a number.

Calculate mean group difference:

MGD = (abs(pn - pr) + abs(pn-ps) + abs(pr - ps))/3
MGD

## [1] 0.1348375

Go to the Multiple Proportions simulation-based applet to complete the analysis using the MGD statistic.

report the p-value: approximately zero, less than 0.0001

Conclusion: We have very strong evidence against the null hypothesis. A statistic of 0.135 or larger never occurred in the simulation that assumes the null hypothesis is true. We have very strong evidence that the treatment used is associated with pain reduction.

The Chi-Square statistic

The Mean Group Diff statistic focuses on how all pairs of proportions compare to each other. Another option is to consider how the proportions compare to some average proportion. As it turns out, this new statistic, called a chi-square statistic, has the advantage of being well predicted by a mathematical model and thus is commonly used in practice for these types of problems. It also can be used on larger sized tables (two categorical variables, with any number of categories each.

Above, we calculated the pooled proportion, pooled = 0.3971, meaning that 39.71% of all the patients experienced a reduction in pain. If the three treatments really don’t differ in their effectiveness, then all three treatments would have success rates nearly equal to 39.71%.

The chi-square statistic is a way to standardized each of the treatment proportions and measures how different each group proportion is from the overall pooled proportion.

Recall, the formula to standardize an observed proportion, \(\hat{p}_1\), given a hypothesized value of the pooled proportion, \(\hat{p}\), is

\[z = \frac{\textrm{observed proportion} - \textrm{center of the null distribution}}{\textrm{standard deviation of the null distribution}}\]

Equivalently,

\[\displaystyle{z =\frac{\hat{p}_1 - \hat{p}}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}}}}\]

where \(n_1\) is the sample size of the group with success proportion \(p_1\).

To calculate the chi-square statistic, \(\chi^2\), we will

standardize pn, pr, and ps,
square these standardized values, and
add them up.

Step 1, standardize pn, pr, and ps:

noneTotal=106+282
realTotal=184+203
shamTotal=171+216
#note:in this case all three groups happen to have 387 participants.

#standardize success proportion, pn, for none group
Zn = (pn-pooled)/sqrt(pooled*(1-pooled)/noneTotal)
Zn

## [1] -4.986942

#standardize success proportion, pr, for real acupuncture group
Zr = (pr-pooled)/sqrt(pooled*(1-pooled)/realTotal)
Zr

## [1] 3.151356

#standardize success proportion, ps, for the sham acupuncture group
Zs = (ps-pooled)/sqrt(pooled*(1-pooled)/shamTotal)
Zs

## [1] 1.800776

Steps 2 and 3, square each standardized value, then add them up:

chiSquare = Zn^2 + Zr^2 + Zs^2
chiSquare

## [1] 38.04343

Validity conditions for theory-based chi-square test

When we were comparing two proportions, we said we had large sample sizes if the number of successes and failures were both at least 10 in each explanatory variable group. We will use the same guideline here. If each cell of the two-way table contains at least 10 observations, then we will consider our sample sizes to be large enough for the theory-based chi-square test to work well. (Note: This is a rather conservative guideline and if your data do not satisfy it, you might still consider the theory-based approach.)

chisq.test(tally(Improvement ~ Acupunture, data=AcupuncData))

## 
##  Pearson's Chi-squared test
## 
## data:  tally(Improvement ~ Acupunture, data = AcupuncData)
## X-squared = 38.054, df = 2, p-value = 5.453e-09

Alternatively,

prop.test(Improvement ~ Acupunture, data = AcupuncData)

## 
##  3-sample test for equality of proportions without continuity correction
## 
## data:  tally(Improvement ~ Acupunture)
## X-squared = 38.054, df = 2, p-value = 5.453e-09
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3 
## 0.2731959 0.4754522 0.4418605

p-value: \(5.453 \times 10^{-9} = 0.000000005453\)

Conclusion: With a p-value such as 0.000000005453, which is nearly zero, we have very strong evidence against the null hypothesis. It is not plausible that all three treatment groups have equal success rates for treating back pain.

Follow up with three ‘Post Hoc’ two-proportion tests.

Since we found evidence that the three treatment groups don’t have equal success rates for treating back pain, we look at pairwise comparisons to seen where significant differences occur.

tally(Improvement ~ Acupunture, data=AcupuncData)

##            Acupunture
## Improvement None Real Sham
##      Better  106  184  171
##      Not     282  203  216

noneTotal=106+282
realTotal=184+203
shamTotal=171+216
#note: in this case all three groups have 387 participants.

#none vs real
prop.test(c(106, 184), c(noneTotal, realTotal), alternative="two.sided", conf.level=0.95, correct=FALSE)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c out of c106 out of noneTotal184 out of realTotal
## X-squared = 33.846, df = 1, p-value = 5.965e-09
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.2689006 -0.1356121
## sample estimates:
##    prop 1    prop 2 
## 0.2731959 0.4754522

#none vs sham
prop.test(c(106, 171), c(noneTotal, shamTotal), alternative="two.sided", conf.level=0.95, correct=FALSE)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c out of c106 out of noneTotal171 out of shamTotal
## X-squared = 23.998, df = 1, p-value = 9.641e-07
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.2351017 -0.1022275
## sample estimates:
##    prop 1    prop 2 
## 0.2731959 0.4418605

#real vs sham
prop.test(c(184, 171), c(realTotal, shamTotal), alternative="two.sided", conf.level=0.95, correct=FALSE)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c out of c184 out of realTotal171 out of shamTotal
## X-squared = 0.8794, df = 1, p-value = 0.3484
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.03657657  0.10376003
## sample estimates:
##    prop 1    prop 2 
## 0.4754522 0.4418605

Conclusions:

real acupuncture vs none: Our p-value of 0.000000005965 gives very strong evidence against the null hypothesis that the treatments of real and non-acupuncture give equal success rates for reducing back pain. Also in support of this conclusion that real and non-acupuncture do not have equal success rates is our 95% confidence interval (-0.2689006, -0.1356121). We are 95% confident that real acupuncture treatment results in higher pain relief rates by about 13.5 to 26.9 percentage points when compared to non-acupuncture treatment.

sham acupuncture vs none: Our p-value of 0.0000009641 gives very strong evidence against the null hypothesis that the treatments of sham and non-acupuncture give equal success rates for reducing back pain. Our 95% confidence interval (-0.2351017, -0.1022275) also supports rejecting the null-hypothesis because 0 is not a plausible value for the difference in the two treatment long-run success rates. We are 95% confident that sham acupuncture treatment results in higher pain relief rates by about 10.2 to 23.5 percentage points when compared to non-acupuncture treatment.

sham acupuncture vs real acupuncture: Our p-value of 0.3484 gives minimal evidence against the null hypothesis that the treatments of sham and real acupuncture give equal success rates for reducing back pain. So from this study we accept the null hypothesis that real and sham acupuncture reduce back pain at equal rates. Our 95% confidence interval (-0.03657657, 0.10376003) also supports the plausibility of the null-hypothesis since 0 is in our 95% confidence interval; thus, 0 is a plausible value for the difference in the two long-run proportions.