Researchers, as a whole, tend to underestimate the need for power. I'm just now starting to get it.
I recently gave a brief, easy-to-follow presentation on statistical power, it's importance, and how to go about getting it.
Hope you find it useful.
2. TOPICS
What is Statistical Power? Why is it important?
Estimating Statistical Power
Useful Software
An example: Mobile Health study
2
3. WHAT IS POWER?
Power = the probability of correctly rejecting a false null
hypothesis (when the alternative hypothesis is true)
Power = 1 - β
More powerful experiment = better chance of rejecting a false null
hypothesis
Thus, reducing the likelihood of Type II error
3
4. WHAT IS POWER?
Statistical power can help answer questions like these:
How large must my sample size be?
How should I design my experiment?
Which measures/test should I use?
I can get about X amount of people in my study, will I have
enough power?
4
5. ESTIMATING STATISTICAL
POWER
One should estimate statistical power during the design phase of
the study
Especially after:
Selecting measures
Choosing a valid statistical test
Power can be estimated for many types of tests (t-Tests, ANOVA,
regression, etc.)
Very common in treatment effectiveness research
5
6. ESTIMATING STATISTICAL
POWER
It’s OK to try out different designs and statistical tests
in the search for the most powerful or practical study.
However, these trials must be done before conducting
the study.
6
7. ONE WAY TO ESTIMATE POWER
Use population means and standard deviations (or best guesses)
Example: Say you want to assign 20 individuals to two groups,
control (C) and treatment (T)*.
Table 1: Population Parameters
Mean
Standard
Deviation
Control 9.64 3.17
Treatment 6.58 3.03
Step 1. Draw 20 random
observations from a population
with scores like the C group
Step 2. Draw 20 random
observations from a population
with scores like the T group
Step 3. Calculate the t statistic
Step 4. Repeat above steps 9,999
more times
To estimate how much power this study
will have, you can follow these steps
*Example from Howell (2013)
7
8. ONE WAY TO ESTIMATE POWER
86% of the results greater than
2.024
Power (given the parameter
estimates) is .86
*Howell (2013), pg 221
Out of the
10,000 t values,
how many are
greater than
tcrit(38) = 2.024?
8
9. THE TRADITIONAL WAY
We know that power depends on the degree of overlap between
sampling distributions
*Howell (2013), pg 222
9
10. THE TRADITIONAL WAY
Overlap/power depends on:
Statistical test
Alpha level
Sample size
Effect size (ES)
μT – μC
σ
Means for treatment and
control populations
Pooled standard
deviation
ES =
10
11. USEFUL SOFTWARE
Commercial:
SAS sample and power size
PASS sample size software
Free:
R package pwr
G*Power
And many more!
I will be using G*Power to illustrate an example
Download G*Power here:
http://www.gpower.hhu.de/
11
12. MOBILE HEALTH STUDY
Research Question:
Will regular (text) messages and targeted messages increase drug
adherence for adult patients with diabetes when compared to
diabetic patients who do not receive messages?
Control Group (G1): No messages
Treatment Group 1 (G2): regular messages
Treatment Group 2 (G3): targeted messages
12
13. MOBILE HEALTH STUDY
What we know
Dependent variable: drug adherence (range=5-25)
Independent variables: G1, G2, G3
Minimally importance difference: 3
(a difference of 3 points is needed to show clinical significance)
Want power = .80
13
14. MOBILE HEALTH STUDY
1. Choose test
Here, we will be using an
omnibus F test of a one-
way ANOVA with 3 levels
(or groups)
14
16. MOBILE HEALTH STUDY
2. Determine
the effect size
Means and standard
deviations are
guided by our
hypotheses and
previous research
SD = 3
Means:G1= 12,
G2 = 13, & G3 = 15
*change power and
group size
16
17. MOBILE HEALTH STUDY
3. Calculate
estimates
Our results:
To achieve a power
of .80 and given the
parameter
estimates,
We will need at least
60 patients (20 per
group) in the study
*note effect size
17
19. A NOTE ON PRACTICALITY
That last test (with ES = .20) calls for a total sample size of 246
patients.
What if that’s not feasible?
You can:
Revisit your study design
Revise hypotheses, attempt other tests, change measures,
etc.
Or, work backwards. Estimate power from a sample size that is
practical
19
20. REFERENCES
Howell, D., C. (2013). Power. In J. D. Hage (Ed.).
Statistical methods for psychology (8th ed., pp. 229-
249). Belmont, CA: Wadsworth, Cengage Learning.
Kraemer, H. C., Thiemann, S. (1987). How many
subjects? Newbury Park, CA: Sage Publications, Inc.
Lipsey, M. W. (1990). Design sensitivity: Statistcal
power or experimental research. Newbury Park, CA:
Sage Publications, Inc.
20
21. BIG THANKS
To Dr. Philippe Gaillard for his wonderful guidance (and books!)
Also to the STAT 7970 class - wonderful audience.
To contact me
email - cdatubo@gmail.com
visit - http://cdatubo.weebly.com/
connect - http://www.linkedin.com/in/cdatubo
21
Editor's Notes
We’re trained to think about significance level or reducing Type I error (finding a difference that is not there)
Rarely trained to consider an equally important topic which is NOT finding a difference that IS there (Type II error)
Knowing the degree of statistical power can lead to a more efficient use of the researchers resources (e.g., not wasting money on small samples that give unreliable results or large samples that are unnecessary)
Knowing the degree of statistical power can lead to a more efficient use of the researchers resources (e.g., not wasting money on small samples that give unreliable results or large samples that are unnecessary)
Knowing the degree of statistical power can lead to a more efficient use of the researchers resources (e.g., not wasting money on small samples that give unreliable results or large samples that are unnecessary)
control = 9.64, 3.17 (sd); treatment 6.58, 3.03(sd); 20 participants in each group
draw 20 observations from pop with similar scores to the control group and 20 from pop like treatment group. Calculate the t stat. do this 9, 999 more times (10,000 t values). Note critical value =t(38)=2.024
Note critical value is for 38 df =2.024
Recall what it looks like when we compare two sampling distributions
H0= when mu=mu0 = when null hypothesis is true
H1= when mu=mu1 = when null hypothesis is false
dark blue= alpha or probability of Type I (one-tailed) rejecting a true null hypothesis
Light blue = to the left of critical t value, Type II error, failing to reject false null hypothesis
Power = the probability that we will correctly reject a false null hypothesis
Power is affected by alpha (or significance level), the true alternative hypothesis, the sample size, and statistical test
Rules of thumb:
Smaller significance level, the larger the necessary sample size
Two-tailed tests need larger sample sizes than one-tailed
The smaller the effect size, the larger the necessary sample size