This is the presentation of the BITS training session on "Essential statistics".
View more material on http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203865:essential-statistics&catid=81:training-pages&Itemid=190
2. Overview
Outline
Formulate a relevant research question
Study design
Gather the data according to the plan
Analyze the data
Explorative data-analyses (descriptives, graphically)
Drawing inference (answer our research question with certain
confidence)
Report the results
Overview 2
3. Experimental versus observation studies Experimental study
Design of an experimental study Observational study
Overview study designs Mixed experimental and observational studies
Part 1
Design of a study
Part 1 – Design of a study 3
4. Experimental versus observation studies Experimental study
Design of an experimental study Observational study
Overview study designs Mixed experimental and observational studies
Experimental study
Factor levels (treatments) randomly assigned over the different
experimental units (control over explanatory variable)
→ information about cause-and effect relationship between the
explanatory factors and a response variable
Example: Effect of Vitamin C on prevention of colds in 800 children. Half of the children
were selected at random and received Vit C (treatment group) the remaining children
received a placebo (control group)
Qualitative explanatory factor with two levels and children as experimental units
Part 1 – Design of a study 4
5. Experimental versus observation studies Experimental study
Design of an experimental study Observational study
Overview study designs Mixed experimental and observational studies
Observational study
Data obtained from non-experimental study: explanatory variables not
controlled, randomization of the treatments to experimental units does not
occur
→ establish associations between the explanatory factors and a response
variable
Example: Company officials wished to study the relation between the age of an employee
and the number of days of illness in a year.
Explanatory variable not controlled → age is observed
Establish associations but no cause-and-effect: a positive relation between age and
number of days of illness may not imply that number of days of illness is the direct result
of age → younger employees work indoors while older employees usually work outdoors,
and therefore work location is more responsible for the number of days of illness instead
of age
Part 1 – Design of a study 5
6. Experimental versus observation studies Experimental study
Design of an experimental study Observational study
Overview study designs Mixed experimental and observational studies
Mixed studies
Example: a clinical trial performed in 3 hospital centers, at each center the effect of drug
on lowering blood cholesterol was investigated. Within each hospital center volunteers
were randomly assigned to one of the two treatments (drug / placebo)
Experimental factor: treatment (drug versus placebo)
Observational factor: hospital center, not randomly assigned since each
volunteer was assigned to the nearest hospital center
Part 1 – Design of a study 6
7. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Structure of the experiment
Factor B
Level 1 Level 2 Level 3
Level 1 1 2 3
Factor A
Level 2 4 5 6
Experimental unit
Replicates = treatment repeated → estimate experimental error
2 levels of factor A x 3 levels of factor B = 6 treatments
experimental unit: smallest unit of experimental material to which a treatment can
be assigned, the experimental unit is determined by the method of randomization
Part 1 – Design of a study 7
8. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Number of factors: initial stages of investigation → include many factors
(more than can possibly studied in a single experiment)
Cause-and-effect diagrams are often used to identify factors that could
affect the outcome → reduce number of factors
Example : 4 factors each 2 levels → 16 treatment combinations
Number of levels of each factor:
Qualitative factors
Quantitative factors: # levels reflect the type of trend expected by the
experimenter
• 2 levels ~ linear change in response: min – max of specified range
• 3 levels ~ quadratic trend
• > 4 levels ~ detailed examination shape of response curve desired
Range of factor is one of the most important design decisions
Part 1 – Design of a study 8
9. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Measurements: precision versus accuracy
Precision of a variable: the degree to which a variable has nearly the same
value when measured several times. It is a function of random error (chance)
and is assessed as the reproducibility of repeated measurements.
Example: weigh the same person 3 times on an electronic balance and
obtain slightly different measurements – 67.5 kg, 67.4 kg and 67.6 kg
The more precise a measurement, the greater the statistical power at a given
sample size to estimate mean values and to test hypothesis
Variability may be due to operator, instrument and subject
Minimize random error and improve precision
Operating manuals, training the operator, refining / automating instruments
Repeat the measurement and average over a larger number of
observations (but! added cost, practical difficulties)
Part 1 – Design of a study 9
10. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Accuracy of a variable: the degree to which a variable actually represents
what it is supposed to represent. It is a function of systematic error (bias)
which is often difficult to detect and has important influence on the validity of
the result.
Example 1: incorrect calibration of an instrument
Example 2: gastric freezing as a treatment for ulcers in the upper part of the
intestine
Improve accuracy and minimize bias
Operating manuals, training the operator, refining / automating instruments
Periodic calibration using a gold standard (example 1)
Blinding: double–blind study: the experimental subject and the evaluator
have no information on which treatment that they receive or give, any
inaccuracy in measuring the outcome will be the same in the 2 groups
(example 2)
Part 1 – Design of a study 10
11. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Bias and variance in shooting arrows at a target. Bias means that the archer
systematically misses in the same direction. Variance means that the arrows
are scattered (Moore and McCabe 2002)
Part 3 – Statistical inference 11
12. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Sampling from a population
Simple random sample
Population (N elements) Sample (n elements)
Random draws
With equal probability
Part 1 – Design of a study 12
13. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Randomization → treatments are at random assigned to experimental units
Tends to eliminate the influence of extraneous factors not under direct
control of the experimenter
Blocking → increase precision by talking into account other factors
Randomization
Group 1 → treatment 1
Males Group 2 → treatment 2
Homogeneous Group 3 → treatment 3
Heterogeneous
Subjects
Randomization
Group 1 → treatment 1
Females Group 2 → treatment 2
Homogeneous Group 3 → treatment 3
Part 1 – Design of a study 13
14. Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population
Stratified Sampling
Suppose we want to know the attitudes of male and female students in the
engineering school
Is a simple random sample from that school a good idea?
No too few women (10%)
Stratify the sample, pick a random sample from
Stratum 1: female engineers
Stratum 2: male engineers
Estimates are measured with comparable precission. Learn from distribution in
each stratum, do NOT pool the data
e.g. if the average weight is 60kg for the women and 80 kg for the men,
The average engineer will weight 10% x 60 + 90% x 80 = 78 kg
Part 1 – Design of a study 14
15. Types of variables
Univariate descriptives
Bivariate descriptives
Part 2
Explorative data-analysis
Part 2 – Explorative data-analysis 15
16. Types of variables
Univariate descriptives
Bivariate descriptives
Descriptive statistics
Allows the researcher to describe or summarize the data. This is typically
done in the beginning of a results section. The researcher gives an idea of the
sample size, the characteristics under study (e.g. baseline characteristics in a
clinical trial)
Example: A total of 235 students participated in this study, 163 women (69.4%)
versus 72 men (30.6%). On average the female students (81.3 ± 19.4) had a
slightly higher score on exam 2 in comparison to the male students (80.7 ±
18.1).
Part 2 – Explorative data-analysis 16
17. Types of variables
Univariate descriptives
Bivariate descriptives
We typically start with univariate explorations (one variable at a time). Next,
describe joint distributions (2 by 2 = bivariate; more variables = multivariate)
Graphical summary to inspect the shape of the distribution: symmetry,
modality, heaviness of tails
Numerical summary: classical measures of location and spread
Mean and standard deviation
Median and interquartile range
Mode: value that occurs most often (useful for nominal data)
Part 2 – Explorative data-analysis 17
18. Types of variables
Univariate descriptives
Bivariate descriptives
Notes on notation
A random variable X is a variable whose value is a numerical outcome of a
random phenomenon (nonnumerical outcomes are numerically encoded)
Random variables are usually denoted by capital letters such as X, Y, …
Fixed constants or observed values are usually denoted by small letters
e.g. x, y. Special constants (to be specified) will be written as Greek letters
α, β, μ, σ
indices i will subscript random or observed outcomes for individual
observations in the data set: Yi , yi
Part 2 – Explorative data-analysis 18
19. Types of variables
Univariate descriptives
Bivariate descriptives
Type Characteristic Example Descriptive Information
statistic content
Categorical the set of all possible
values can be enumerated
• Nominal Unordered categories Gender, race Counts, Lower
proportions
• Ordinal Ordered categories Degree of pain Median Intermediate
Continuous can take all possible values Weight, Mean, Higher
or ordered within some interval of real number of standard
discrete numbers (continuous) or cigarettes per deviation
limited to integers (discrete) day
Part 2 – Explorative data-analysis 19
20. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Mean of a series of observations xi, i = 1, 2, …, n
Properties given that X and Y are random variables and ‘a’ is a scalar
µ aX +b = aµ X + b = ax + b
µ X +Y = µY + µY = x + y
Median (M): middle of the distribution such that at least 50% of the outcomes
is larger than or equal to M and at least 50% of the outcomes is smaller than
or equal to M
For n uneven: this is the middle value in order of magnitude
For n even: one will take the average of the two middle values
Part 2 – Explorative data-analysis 20
21. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Mean is very sensitive to outliers
Numbers of partners desired in the next 30 years
Miller and Fishkin, 1997
Part 2 – Explorative data-analysis 21
22. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Standard deviation of a series of observed values xi
1 n
SD( x) =
n
∑i =1 ( xi − x) 2
When the variable is approximately normally distributed, approximately 95% of
the data will lie between x − 1.96 SD( x) and x + 1.96 SD( x)
Square of SD is called the Variance Var(x)
SD( x)
Variation coefficient 100%
x
Part 2 – Explorative data-analysis 22
23. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Interquartile range (IQR): distance Q3 – Q1 with
Q1: a value such that at least 25% of the outcomes fall below Q1 and at
least 75% of the outcomes fall above Q1
Q3: a value such that at least 75% of the outcomes fall below Q3 and at
least 25% of the outcomes fall above Q3
If more than one value satisfies this criterion, the average is usually taken
Part 2 – Explorative data-analysis 23
24. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Five number summary: Min, Q1, Median, Q3 Max
whiskers
reach to largest
observation within a
distance of 1.5 x IQR
1.5 x IQR
Birth weight
IQR quartiles
Median
Part 2 – Explorative data-analysis 24
25. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Bar diagram for continuous data – relative or absolute frequencies
Percentage
Birth weight
Part 2 – Explorative data-analysis 25
26. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Normal distribution
1 x−µ
2
1 −
Density φ ( x) = e 2 σ
σ 2π
μ is the population mean
σ² is the population variance
Notation X ~ N(μ, σ²)
X −µ
If X ~ N(μ, σ²), then Z = ~ N(0, 1) is a standard normal distribution
σ
Part 2 – Explorative data-analysis 26
27. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Properties of the standard normal distribution N(0, 1)
unimodal: 1 maximum (i.e. 0)
symmetric around 0
68-95-99.7 rule:
• 68% of the area under the curve (AUC) lies between -1 and 1, 68% of
the observations fall within 1 SD of the mean μ
• 95% of the AUC lies between -2 and 2, 95% of the observations fall
within 2 SD of the mean μ
• 99.7% of the AUC lies between -3 and 3, 99.7% of the observations fall
within 3 SD of the mean μ
Part 2 – Explorative data-analysis 27
28. Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread
Normal quantile plot
Compares two distributions by plotting their quantiles against each other
If the observed and the normal distribution are identical, points are expected to
lie on a straight line with intercept 0 and slope 1
Distributions with the same shape but simply rescaled or shifted still show up
on a straight line but with different intercept (shift) or slope (scale change)
Normal Q-Q plot of randomly generate data N(0, 1) randomly generated exponential data
Part 2 – Explorative data-analysis 28
29. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Bivariate relations – continuous data
Graphical: boxplots, (stacked) histrograms, scatter plots
Correlation coefficient (r):
Takes values between -1 and 1
Pearson correlation coefficient
expresses a degree of linear dependence
1 n xi − x yi − y
r = ∑ ×
n i =1 SD( x) SD( y )
! Summary statistic cannot r = 0.816
replace the individual
examination of the data
Source wikipedia – Anscombe’s Quartet
Part 2 – Explorative data-analysis 29
30. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Bivariate relations - Spearman’s Rank correlation (-1 and 1)
Measures of monotone association (extent to which as one variable
increases, the other variable tends to increase or decrease)
No assumption on linearity
Ordinal variables
Source: Answers.com
Part 2 – Explorative data-analysis 30
31. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Bivariate relations - Spearman’s Rank correlation (-1 and 1)
Corneal irregular astigmatism after
laser in situ keratomileusis for
myopia
Br J Ophthalmol 2001;85:534-536
X
Spearman rank correlation
http://geographyfieldwork.com/SpearmansRank.htm rs=0.440, p <0.0001
Part 2 – Explorative data-analysis 31
32. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
2x2 associations – categorical data: comparing two proportions
Many studies are designed to compare two groups (X) on a binary response
variable (Y) Y
X Success Failure
Group 1 π1 1-π1 π: probability of succes
Group 2 π2 1-π2 1-π: probability of failure
Example: is there an association between antiviral drug use (X) and pneumonia
(Y).
Pneumonia Pneumonia
Yes No Yes No
Antiviral drug 579 45172 45751 Antiviral drug 0.013 0.987 1
Control 648 45103 45751 Control 0.014 0.986 1
Part 2 – Explorative data-analysis 32
33. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Risk difference: is there a difference between the group taking antiviral drug and
the control group
π1 – π2 = 0.013 – 0.014 = -0.001
Properties
-1 ≤ (π1 - π2) ≤ 1
if response is independent of group, then (π1 - π2) = 0
A difference may be more important when both success probabilities are close
to 0 or 1 than when both p’s are close to 0.5
Example (p1-p2) = 0.09 (0.1-0.01=0.09) or (0.50-0.41=0.09)
In the first case, p1 is 10 times larger than p2 while in the second case p1 is
only 1.2 times larger than p2.
Part 2 – Explorative data-analysis 33
34. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Relative risk: ratio of the success probabilities of the 2 groups
Properties
0 ≤ (π1/ π2) ≥ 1
if response is independent of group, then (π1/ π2) = 1
Antiviral drug example
(p1/p2) = (.013/.0.14) = 0.894 with 95% CI: 0.799, 0.999
The sample proportion of pneumonia cases was 10.6% lower for the group
prescribed antiviral drug. The CI of the relative risk indicates that the risk
of pneumonia is at least 1% lower for the group prescribed antiviral drug.
Part 2 – Explorative data-analysis 34
35. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Odds ratio
For a probability π of success, the odds are defined to be
Odds ≥ 0 with values > 1 when a success is more likely than a failure. For
example, if π = .75, then the odds of success = .75/.25 = 3.0: a success is
three times as likely as a failure. If Ω = 1/3, a failure is three times as likely as
a success.
The ratio of the odds Ω1 and Ω2 in the two rows is called the odds ratio
Properties odds ratio
0≤θ≥∞
When X and Y are independent, then θ = 1
the odds ratio does not change value when the orientation of the table
reverses (rows become columns, columns become rows)
Part 2 – Explorative data-analysis 35
36. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Odds ratio - continued
Properties
if θ = 4, the odds of success in row 1 are 4 times the odds in row 2, and
thus subjects in row 1 are more likely to have success than are subjects in
row 2
θ = 4 does not mean that the probability π1 is four times π2 (that would be
the interpretation of relative risk)
the odds ratio does not change when both cell counts within any row (or
column, but not both) are multiplied by a nonzero constant; this implies
that the odds ratio does not depend on the marginal counts within a
row/column
Part 2 – Explorative data-analysis 36
37. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Odds ratio - Example
Pneumonia
Sample odds ratio is computed by
Yes No
Antiviral drug 579 45172 45751
Control 648 45103 45751
For the patients prescribed antiviral drug, the estimated odds of pneumonia is
579/45751 = 0.013. There were 1.3% pneumonia cases for every 100 cases
with no pneumonia.
The sample odds ratio = 579*45103/648*45172 = 0.892. (95% CI: 0.797,
0.999). The estimated odds for patients prescribed antiviral drug equals 0.892
times the estimated odds for patients in the control group. The estimated odds
were 10.8% lower for the antiviral drug group.
Part 2 – Explorative data-analysis 37
38. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
Relation between odds ratio and relative risk
When the proportion of successes is close to 0 for both groups, the sample
odds ratio is similar to the sample relative risk. In such a case, on odds ratio of
0.89 does mean that the probability of success for the patients prescribed
antiviral drug is about 0.89 times the probability of success for the patients in
the control group
Relative risk = 0.894 (95% CI: 0.799, 0.999)
Odds ratio = 0.892 (95% CI: 0.797, 0.999)
Part 2 – Explorative data-analysis 38
39. Types of variables Continuous data
Univariate descriptives Categorical data
Bivariate descriptives
What should be used, risk difference, relative risk or odds ratio
The odds ratio is the preferred estimate
In a case-control study it is usually not possible to estimate the probability of
an outcome given X (π1), and therefore it is also not possible to estimate the
difference of proportions or relative risk for that outcome
In a retrospective study, 709 patients with lung cancer (cases) were queried
about their smoking behavior (X). Each case was matched with a control
patients: same age, same gender, same hospital but no lung cancer
Odds ratio = 2.97 the estimated odds of lung cancer for smokers were
2.97 times the estimated odds for non-smokers
Lung cancer
Cases Controls
Smoker 688 650
Non-smoker 21 59
Total 709 709
Part 2 – Explorative data-analysis 39
40. Part 3
Statistical inference
Part 3 – Statistical inference 40
41. Distributions
Bias and variance
Hypothesis testing
Statistical inference: by using the laws of probability, we infer conclusions
about a population from data collected in a random sample
Population (N elements)
Sample (n elements)
X Random sample
X Collect data
μ, σ SD(x)
Make inferences
about population
A parameter (μ, σ) is a number that describes the population. A
parameter is a fixed number, but its value is unkown in practice.
A statistic ( X , SD( x) ) is a number that describes the sample. Its value is
known when we have collected a sample, but it changes from sample to
sample.
Part 3 – Statistical inference 41
42. Distributions Binomial distribution
Bias and variance Poisson distribution
Hypothesis testing Normal distribution
The sampling distribution of a statistic is the distribution of values taken by
the statistic in all possible samples of the same size from the same
population.
Binomial distribution
Poisson distribution
Normal distribution
Part 3 – Statistical inference 42
43. Distributions Binomial distribution
Bias and variance Poisson distribution
Hypothesis testing Normal distribution
Binomial distribution
Fixed number of n independent observations
Each observations falls in one of two categories (success/failure)
The probability of success ‘p’ is the same for each observation
→ denote X the number of successes among the n observations which
can take values 0, 1, …, n then X ~ B(n, p)
Properties
µ X = np
σ X = np(1 − p)
2
Probability mass function
Part 3 – Statistical inference 43
44. Distributions Binomial distribution
Bias and variance Poisson distribution
Hypothesis testing Normal distribution
Poisson distribution: expresses the number Y of events in a given unit of
time, space, volume, or any other dimension
Example → modeling a phenomenon in which we are waiting for an
occurrence (waiting for customers to arrive in a bank)
Basic assumption: for small time intervals, the probability of an occurrence
is proportional to the length of waiting time
Single parameter λ >0, the average number of events per unit of
measurement.
k = number of occurrences of an event
λ = expected number of occurrences that occur during the given interval
µY = λ
σY = λ
2
Part 3 – Statistical inference 44
45. Distributions Binomial distribution
Bias and variance Poisson distribution
Hypothesis testing Normal distribution
Normal distribution
1 x−µ
2
1 −
density φ ( x) = e 2 σ
σ 2π
X1, X2, …, Xn is a simple random sample with mean μ and variance σ²
if Xi ~ N(μ, σ²) then X ~ N(μ, σ²/n)
Central limit theorem
Draw a simple random sample (X1,… , Xn) of size n from a population with
mean μ and finite variance σ². When n is large, the sample average then
follows approximately a normal distribution regardless of the data distribution.
σ²
X ~ N µ,
n
Part 3 – Statistical inference 45
46. Distributions Sampling variability
Bias and variance Standard deviation vs standard error
Hypothesis testing Confidence interval
Law of large numbers: population mean μ of X is unknown. The mean x of a
simple random sample → estimate of μ .
X is a random variable that varies in repeated sampling
guarantees that as the sample size of a simple random sample increases,
the sample mean x gets closer to the population mean μ
Unbiased statistic: a statistic used to estimate an unknown parameter is
unbiased if the mean of its sampling distribution is equal to the true value of
the parameter being estimated.
Variability of a statistic is described by the spread of its sampling
distribution.
Spread determined by sampling design and sample size. Larger samples
have smaller spread.
Part 3 – Statistical inference 46
47. Distributions Sampling variability
Bias and variance Standard deviation vs standard error
Hypothesis testing Confidence interval
How precise is our estimate?
Sample Population
Generalize findings for general population
Estimate must approximate the population value
Representative sample
→ prevents the results for the sample from being biased
→ results are still subject to sampling variability: different samples from
the same population will yield different results
Generalizing results from the sample to the study population then requires
that we acknowledge sampling variability
Part 3 – Statistical inference 47
48. Distributions Sampling variability
Bias and variance Standard deviation vs standard error
Hypothesis testing Confidence interval
Standard deviation ≠ standard error
Standard error measures the uncertainty in an estimate (standard error of
the mean = SEM) µ σ
n
Sampling distribution of the sample means X
Standard deviation (SD) of the observations → measures the variability in
the observations
both are standard deviations, but the standard error shrinks with increasing
sample size, in contrast to the standard deviation of the observations
The mean and SD are the preferred summary statistics for (normally
distributed) data, and the mean and 95% confidence interval are preferred for
reporting an estimate and its measure of precision.
Part 3 – Statistical inference 48
49. Distributions Sampling variability
Bias and variance Standard deviation vs standard error
Hypothesis testing Confidence interval
Confidence intervals
When we estimate a parameter by calculating a sample statistic, there is a
degree of uncertainty in our estimation
We can construct an interval around the sample mean X within which we
expect the true population mean μ with known probability (e.g. 95% chance)
(1-α)100% confidence interval for the mean contains the population
mean with (1-α)100 % chance. Confidence level or coverage probability is
(1-α)
σ known σ unknown
σ s
X ±z X ± t n −1,α / 2 ×
n n
Part 3 – Statistical inference 49
50. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
Hypothesis testing
The null hypothesis (Ho) assumes ‘no difference’ or ‘no effect’
The average … is equal in both treatment groups
The alternative hypothesis (HA) is claiming the opposite
The average … differs by treatment
Type of decision H0 true HA true
Accept H0
Correct decision (1-α) Type II error (β)
p>α
Reject H0
Type I error (α) Correct decision (1- β)
p<α
Power
Part 3 – Statistical inference 50
51. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
We assume H0 is true unless we can demonstrate, based on sample data at
the desired level of confidence, that HA is true.
→ level of confidence related to 2 potential types of statistical errors
• example: in a clinical trial we want to study the effect of an experimental drug (T)
and compare it to a placebo (P)
H0 : effect of drug T = effect of P
HA : effect of drug T ≠ effect of P
Type I error (false positive): concern of the regulators, the drug is not
working but it will go to the market
Type II error (false negative): concern of pharmaceutical companies, could
not prove that the new drug is working
Part 3 – Statistical inference 51
52. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
Sensitivity and specificity
Gold standard
Positive (ill) Negative (not-ill)
Test outcome False Positive (FP)
True Positive (TP)
→ Positive Type I error (P-value)
Test outcome False negative(FN)
True Negative (TN)
→ Negative Type II error
Sensitivity Specificity
Proportion ill Proportion non-ill
people identified people identified
as being ill non-ill
Part 3 – Statistical inference 52
53. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
When are hypothesis needed
Hypothesis are not needed in descriptive studies
If any of the following terms appears in the research question (study not
simply descriptive) a hypothesis should be formulated: greater than, less than,
causes, leads to, compared with, more likely than, associated with, related to,
similar to, correlated with.
The hypothesis should be clearly stated in advance.
Part 3 – Statistical inference 53
54. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
Principal of statistical testing
calculate a test statistic which measures ‘distance’ from the observed sample
to the null hypothesis, whose distribution is known under the null hypothesis
Reject Ho
test statistic t exceeds a chosen cut-off c (critical value) in magnitude
p-value stays below a chosen cut-off α in magnitude
safety principle: cut-off is chosen such that the risk of making a Type I error is
controlled at a prespecified significance level α
Usually α = 0.05 (test performed at the 5% significance level)
the power of the test (probability to avoid Type II errors, 1-β) is not controlled
→ chose adequate designs and sufficiently large sample sizes
Part 3 – Statistical inference 54
55. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
critical value c: reject H0 when the test statistic t exceeds the chosen cut-off c
in magnitude
p-value: probability to find a result for the test statistic at least as extreme as
the observed result (in the direction of the alternative hypothesis), if the null
hypothesis holds
Acceptance region
α = 0.05
Rejection region Rejection region
α α
2 2
cL cR
Distribution of test statistic
Part 3 – Statistical inference 55
56. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
Power: 1 − β = 1 − P (accept H0|HA) = P (reject HA|HA)
For many testing problems H0 is formulated very precisely, but there are
usually an infinite number of distributions consistent with HA.
σ
n
µ1 − µ 0 With what probability must the statistical test
Standardized effect size
σ detect this smallest relevant difference?
~ 91% chance of finding an association of that
size or greater
Part 3 – Statistical inference 56
57. Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing
One-sided versus two sided testing
Two-sided testing One-sided testing
Decided prior to data analysis and avoid one-sided tests unless there are
really good reasons for using them (only one direction of the association is
clinically or biologically relevant)
never wrong to use a two-sided test where a one-sided test is applicable
at most a slight loss of power
Part 3 – Statistical inference 57
58. Distributions
Bias and variance
Hypothesis testing
Multiple and Post Hoc Hypotheses - testing problem
Inflated rate of false positive conclusions (Type I error)
Assume we perform 3 independent comparison between 2 groups, each
conducted with α = 0.05
The probability that each of the tests → conclude H0 is correct in each case
= (0.95)³ =0.857
→ the chance of finding at least one false positive statistically significant test
increases to 14.3% (1-0.857=0.143, not 0.05)
Adjusting for multiple hypotheses is especially important when the
consequences of making a false positive error are large e.g. mistakenly
concluding that an ineffective treatment is beneficial
Adjustments can be made → False Discovery rate control
Part 3 – Statistical inference 58
59. Part 4
Statistical tests
Part 4 – Statistical tests 59
60. Continuous/Categorical data Parametric statistics
Non-parametric statistics
Categorical data – Proportions
Continuous data
Parametric statistics
Non-parametric statistics
Categorical data
Ordinal versus nominal
Types of testing
One-sample tests
Two dependent groups
Two independent groups
More than two groups
Controlling for covariates
Part 4 – Statistical tests 60
61. Continuous/Categorical data Parametric statistics
Non-parametric statistics
Categorical data – Proportions
Dependent versus independent
Dependent Independent
Subject Time x Time y Subject Treatment Weight
Treatment A Treatment B
Volunteer 1 A x1A
Weight Weight
Volunteer 1 x1A x1B Volunteer 2 A x2A
Volunteer 2 x2A x2B Volunteer 3 A x3A
Volunteer 3 x3A x3B Volunteer 4 x4A
A
Volunteer 4 x4A x4B Volunteer 5 x5A
A
Volunteer 5 x5A x5B
Volunteer 6 B x6B
Volunteer 7 B x7B
Volunteer 8 B x8B
Volunteer 9 B x9B
Volunteer 10 B x10B
Part 4 – Statistical tests 61
62. Continuous data Parametric statistics
Categorical data – Proportions Non-parametric statistics
Parametric statistics
assumes that the data come from a type of probability distribution and make
inferences about the parameters of the distribution
requires assumptions (e.g. Normal distribution), if they are correct they
produce more accurate and precise estimates and have generally more
statistical power
e.g. Independent sample t-test
Assumptions
• Independent observations
• Population 1 → X1i ~ N(μ1, σ²)
Population 2 → X2i ~ N(μ2, σ²)
H0 : μ1 = μ2 → H0 two distributions are equal
Part 4 – Statistical tests 62
63. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Non-parametric statistics – rank tests
no specific assumption about the population distribution required
Example: statistics based on Rank tests
Let X1, …, Xn denote a sample of n observations, the rank of observation Xj is
defined as
Rj = R(Xj) = number of observations in the sample < Xj
n
= ∑ I (X i ≤ X j )
i =1
The smallest observation gets rank 1, the second smallest rank 2, …, the
largest observation gets rank n.
In case of ties (a tie is a pair of equal observations), the ranks of the tied
observations are defined as the average of their ranks according to the
definition just given. These are called mid-ranks.
Part 4 – Statistical tests 63
64. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Example Observations Ranks
2 1
8 2
12 (3+4)/2
12 (3+4)/2
15 5
39 6
Properties of rank-transformed observations
they only depend on the ordering of the observations
they are insensitive to outliers (robust)
the distribution of the ranks does not depend on the distribution of the
observations
Part 4 – Statistical tests 64
65. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Non-parametric statistics – permutation tests
reference distribution of a characteristic of interest is obtained by calculating
all possible values of the test statistic under rearrangements of the labels on
the observed data points.
Example: a company has a new training program and whishes to evaluate if the
new method is better than the traditional one. To assess the effect of the new
method, they set up an experiment with 7 new employees. Four of them are
randomly assigned to the new training method, and the other three received the
old training method.
Observed data Rearrangement
New Traditional New Traditional Permutations
37 23 37 23
7 7!
49 31 49 31 55
= = 35
55 46 55 31 46 4 4!3!
57 57
Part 4 – Statistical tests 65
66. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Permutation tests
to verify whether there is a difference in means of a continuous measurement
in 2 independent populations
Permutation null distribution
H0 : F1(x) = F2(x) for all x.
HA : μ1 > μ2
Test statistic T = X1 − X 2
Example: we have 35 possible permutations (each having a t*-value), the
collection of all the t*-values is the permutation null distribution
Part 4 – Statistical tests 66
67. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Permutation test - example
Test statistic T = X1 − X 2 → t = 49.5 – 33.3 = 16.2
Permutation null distribution of the 35 possible permutations, under the null
hypothesis all t*-values are equally likely
H0 will be rejected for large T (T>c, critical value), c controls the type I error
rate at α P(T > c |H0) < α
Part 4 – Statistical tests 67
68. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Parametric versus non-parametric tests
Parametric tests: the data are sampled from a population with N-distribution
OR large sample size (CLT)
Smaller sample size: outliers or skewed distribution can be problematic →
transformation or non-parametric tests (permutation or rank tests)
Permutation tests: very flexible
Non-parametric rank tests: in case of no meaningful measurement scale (pain
score, Apgar score, …)
Careful with formulation of H0 and interpretation of the analysis
Less power
Part 4 – Statistical tests 68
69. Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests
Categorical / discrete data: the set of all possible values can be enumerated
Ordinal data: ordered categories
Age group, pain assessment from no to severe, Likert scales (agree
strongly, agree, neutral, disagree, disagree strongly)
Nominal data: categories have no natural order, sometimes called
qualitative data (gender, race, hair color)
Counts: variables are represented by frequencies
Proportions / percentages
Ratio of counts e.g. binary or dichotomous data: have exactly two possible
outcomes (success / failure), we count the number of success in the
number of trials
Part 4 – Statistical tests 69
70. One-sample tests Parametric statistics One-sample t-test
Non-parametric statistics
Categorical data - Proportions
One-sample t-test
to verify whether the mean of a continuous measurement deviates from a
given value μ0
H0 : μ = μ0
HA : μ ≠ μ0
Test statistic
t-distributed with n-1 degrees of freedom (df)
Assumptions
Independent observations
Normally distributed observations or large sample
Part 4 – Statistical tests 70
71. One-sample tests Parametric statistics 1-way contingency tables
Non-parametric statistics
Categorical data – Proportions
One categorical variable with J ≥ 2 categories
Example: number of students in each of the three main subjects in the 1st
master psychology (2003-2004)
Suppose that in the population, the true proportions are:
Part 6 – Categorical data 71
72. One-sample tests Parametric statistics 1-way contingency tables
Non-parametric statistics
Categorical data – Proportions
X² test One categorical variable with J ≥ 2 categories
Statistic
H0 : pj = πj for all j or for frequencies nj = μj
HA : pj ≠ πj
Statistic
Example, df = J − 1 = 2 and P < .0001, strongly suggesting that the null
hypothesis should be rejected.
Part 6 – Categorical data 72
73. Two dependent samples Parametric statistics Paired sample t-test
Non-parametric statistics
Categorical data - Proportions
Paired sample t-test
to verify whether 2 continuous measurements, obtained from paired subjects,
are the same on average
H0 : μ1 = μ2
HA : μ1 ≠ μ2
→ calculate differences Y = X1 – X2 and use the one-sample t-test to verify
whether H0 : μ = 0 versus HA : μ ≠ 0, where μ is the average of Y
Assumptions
Independent differences
Normally distributed differences or large sample (n ≥ 40)
n ≥ 15 t-test fine unless very skewed distribution or outliers
n < 15 data ~ N-distr, very skewed distribution or outliers problematic
Part 4 – Statistical tests Source assumptions ‘Introduction to the practice of statistics, Moore & McCabe’ 73
74. Two dependent samples Parametric statistics Wilcoxon signed rank test
Non-parametric statistics
Categorical data - Proportions
Wilcoxon signed rank test
Compare 2 dependent samples → the difference variable Y = X1 - X2
Whit Yi + observations on the positive differences (i = 1, …, n+) and Yi -
observations on the negative differences (i = 1, …, n-) then
H0 : P(Y - < Y +) = ½
HA : P(Y - < Y +) > ½
Statistic
Part 4 – Statistical tests 74
75. Two dependent samples Parametric statistics Wilcoxon signed rank test
Non-parametric statistics
Categorical data - Proportions
Wilcoxon signed rank test - Example
Two stories ware narrated to children with reading disorders, story 1 was not
illustrated whereas story 2 was illustrated
Child 1 2 3 4 5
Story 1 0.40 0.72 0.00 0.36 0.55
Story 2 0.77 0.49 0.66 0.28 0.38
Difference (Yi ) 0.37 -0.23 0.66 -0.08 -0.17
ranks of |Yi | 4 3 5 1 2
signed ranks 4 -3 5 -1 -2 V=9
V= 9, n=5, p=0.406
From this small sample we could not conclude that children with reading disorders
can tell a story better when the story was illustrated.
Part 4 – Statistical tests 75
76. Two dependent samples Parametric statistics Models for matched pairs
Non-parametric statistics
Categorical data - Proportions
Models for matched pairs
For comparing categorical responses for 2 samples when each sample has
the same subject or when a natural pairing exists between each subject in one
sample and a subject from the other sample.
McNemar test compares proportions in paired studies
H0 : π1+ = π+1 After Total
Before Yes No
HA : π1+ ≠ π+1
Yes n11 n12 n1+
No n21 n22 n2+
Total n+1 n+2 n
Part 4 – Statistical tests 76
77. Two independent samples Parametric statistics Independent sample t-test
Non-parametric statistics
Categorical data - Proportions
Independent sample t-test
to verify whether the mean of a continuous measurement is the same in 2
independent populations
H0 : μ1 = μ2 versus HA : μ1 ≠ μ2
Test statistic
Measurement variance = in the 2 groups
Measurement variance ≠ in the 2 groups t*
Assumptions
Independent observations
Normally distributed observations or large sample in each group
Small but equal sample size n1 = n2 = 5 and shape of distributions
comparable → we can still trust on t-test procedures
Part 4 – Statistical tests 77
78. Two independent samples Parametric statistics Independent sample t-test
Non-parametric statistics
Categorical data - Proportions
Independent sample t-test – continued
Measurement variance = in the 2 groups, SE of the mean difference can be
estimated as
With
Measurement variance ≠ in the 2 groups, SE of the mean difference can be
estimated as
(1-α)100% confidence interval for μ1 - μ2 versus
Part 4 – Statistical tests 78
79. Two independent samples Parametric statistics Rank tests
Non-parametric statistics Mann-Whitney U, Wilcoxon Rank Sum
Categorical data - Proportions
Mann-Whitney (U) test, Wilcoxon rank-sum test
Compare 2 independent samples
H0 : F1(x) = F2(x) for all x
HA : P(X1 < X2) ≠ ½
where X1 and X2 have distributions F1 and F2, respectively.
If X1 and X2 are continuous random variables, the test may be thought of
as testing the null hypothesis that the probability of an observation from
one population exceeding an observation from the second population is
0.5, this implies
P(X1 < X2) = P(X1 > X2) = ½
→ test statistics based on this principle
Part 4 – Statistical tests 79
80. Two independent samples Parametric statistics Rank tests
Non-parametric statistics Mann-Whitney U, Wilcoxon Rank Sum
Categorical data - Proportions
Is the Wilcoxon rank-sum test the nonparametric alternative for the
independent-sample t-test?
Remember
H0 : F1(x) = F2(x) for all x (2 distributions are equal)
HA : P(X1 < X2) ≠ ½
→ the ranks cannot be used to estimate the mean!
Independent sample t-test
H0 : μ1 = μ2
HA : μ1 ≠ μ2
Part 4 – Statistical tests 80
81. Two independent samples Parametric statistics 2X2 contingency tables
Non-parametric statistics
Categorical data – Proportions
2x2 contingency tables
Example: Patient characteristics at the onset of first-line treatment with
gefitinib or chemotherapy
Frequency Conditional distribution of ECOP PS
status given treatment
ECOG PS Total ECOG PS Total
Treatm <2 ≥2 Treatm <2 ≥2
Gefinitib 70 17 87 Gefinitib 0.805 0.195 1.00
Chemo 57 4 61 Chemo 0.934 0.066 1.00
Total 127 21 Total
Two variables are said to be statistically independent if the conditional
distributions of Y (Eastern Cooperative Oncology Performance status) are
identical at each level of X (treatment)
Part 4 – Statistical tests 81
82. Two independent samples Parametric statistics 2X2 contingency tables
Non-parametric statistics
Categorical data – Proportions
Testing independence - Pearson chi-square test
H0 : πij = πi+ π+j for all i and j or for frequencies nj = μj
HA : πij ≠ πi+ π+j
Statistic
Example
Χ² = 4.964, df=1, ECOG PS status and treatment are significantly associated,
The proportion of patients with a poor ECOG performance status (≥ 2) was
higher in the first-line gefitinib group (20%) than in the first-line chemotherapy
group (7%; P = 0.026).
Part 4 – Statistical tests 82
83. Two independent samples Parametric statistics 2X2 contingency tables
Non-parametric statistics
Categorical data – Proportions
Testing independence – Fisher’s exact test
For small samples, Fisher’s exact test: assumes that the row and margin totals
are fixed (hypergeometric distribution). When this assumption is not met (most
cases), Fisher’s exact test is very conservative, resulting in a type I error below
0.05.
H0 : θ = 1
HA : θ ≠ 1
Treatm Adeno Nonadeno Total
Gefinitib 85 2 87
Two-sided p-values:
Chemo 58 3 61 Fisher’s exact test p = 0.403
Total 142 5 673 Chi-square test p=0.385
Part 6 – Categorical data 83
84. Two independent samples Parametric statistics 2X2 contingency tables
Non-parametric statistics
Categorical data – Proportions
Large samples
In case of very large sample sizes pearson chi-square will reject almost any
null hypothesis, even if the deviation of the observed from the expected counts
is of little importance → use the Gini index (value equals the proportion of
observations that would have to be moved from one cell to another in order for
the observed counts to equal the expected counts
Small samples
Inferences based on chi-square distribution become questionable when the
expected counts in some cells become too small (below 5) even when the
total sample size is large → use exact solutions (Fishers Exact test)
Part 6 – Categorical data 84
85. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
One-way analysis of variance (ANOVA)
to verify whether the mean of a continuous measurement is the same in 2 or
more independent populations
H0 : μ1 = μ2 = … = μk versus
HA : at least 1 of the population means differs
Between MSE H0
Test statistic F = ~ F
k −1, n − k
Within MSE
Assumptions
Independent observations
Normally distributed observations or large sample within each group (Q-Q
plots)
Equal variance in each group (boxplots or Levene’s test)
Part 4 – Statistical tests 85
86. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
ANOVA principle
Is variation between groups large as compared to variation within groups
Consider k groups with each ni observations with jth observation in ith group
k ni k ni k ni
∑∑ (Yij − Y ) 2 = ∑∑ (Yij − Yi ) 2 + ∑∑ (Yi − Y ) 2
i =1 j =1 i =1 j =1 i =1 j =1
Total Sum of Squares = within SS + between SS
Part 4 – Statistical tests 86
87. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
ANOVA Table
Source Sum of Squares df Mean Squared Error F
SS MSE
k ni
Between ∑∑ (Y i − Y ) 2
i =1 j =1
k-1 SS B
k −1
MSEB
MSEW
k ni
Within ∑∑ (Y
i =1 j =1
ij − Yi ) 2 n-k SSW
n−k
k ni
Total ∑∑ (Y
i =1 j =1
ij − Y )2
Part 4 – Statistical tests 87
88. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
Deviations from the assumptions
one-way analysis of variance is robust against lack of normality
→ in case of important deviations from a normal distribution : use
nonparametric Kruskal-Wallis test or transformations
ANOVA is not sensitive to the assumption of homogeneity of variances
(perform Levene’s test at the 1% sigificance level)
→ heterogeneity of variances
• little impact when the group level sample sizes ≈ equal: Type I error rate
is slightly increased
• with important heterogeneity and markedly ≠ group level sample sizes,
weighted least squares regression may be used, weighting each
observation by the inverse group level standard deviation
Part 4 – Statistical tests 88
89. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
Post-hoc analysis
if ANOVA detects no difference, we conclude that there is insufficient evidence
of a difference in means
if ANOVA detects a difference → post hoc analysis to investigate where the
-
difference is
DO NOT perform all pairwise comparisons using independent samples t-tests
→ multiple testing problem
Assume we perform 3 different t-test, each conducted with α = 0.05
The probability that each of the tests → conclude H0 is correct in each
case = (0.95)³ =0.857 (assuming independence of tests)
→ the level of sign that at least one of the three tests leads to conclusion
HA when H0 holds in each case would be 1-0.857=0.143 (not 0.05).
The level of significance and power for a family of tests ≠ individual test
Part 4 – Statistical tests 89
90. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
Family-wise error rate - αE
The probability of making at least 1 false discovery (type I errors) among all
the hypotheses when performing multiple pairwise tests
→ We should correct for the risk of false detections
most procedures for multiple testing are designed to control the risk of at least
1 false detection at αE, assuming that all k null hypotheses are true
when the k tests are independent, each with significance level α, then
αE = P(at least 1 Type I error) = 1 − (1 − α)k ≈ k α
family-wise error rate increases with the number of tests
Part 4 – Statistical tests 90
91. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
Multiple comparison procedures that control family-wise error rate
Bonferroni procedure
Conservative test: makes less Type I errors than allowed for (and thus
more Type II errors)
Only applicable when the effects to be investigated are identified in
advance of the data analysis
Tukey procedure
Preferred method when only pairwise comparisons are to be made
Scheffé procedure
Preferred method when the family of interest is a set of all possible
contrasts among the factor level means
Part 4 – Statistical tests 91
92. ≥ two independent samples Parametric statistics Analysis of Variance
Non-parametric statistics
Categorical data – Proportions
Rules of thumb
never interpret a large p-value as indicating absence of association
never interpret a small p-value as indicating an important association
report p-values in combination with an effect estimate and confidence interval!
This allows for judging whether the effect is practically significant.
in some cases, it may be advisable to determine equivalence intervals prior to
data analysis
Part 4 – Statistical tests 92
93. > two independent samples Parametric statistics Kruskal-Wallis test
Non-parametric statistics
Categorical data – Proportions
Kruskal-Wallis rank test
k-sample problem, compare more than 2 independent samples
H0 : F1(x) = F2(x) = … = Fk(x) for all x
HA : P(X1 < X2) ≠ ½ the observations in some populations are
systematically larger than in other populations
Assumptions
the observations in each group come from populations with the same
shape of distribution
Part 4 – Statistical tests 93
94. > two independent samples Parametric statistics Kruskal-Wallis test
Non-parametric statistics
Categorical data – Proportions
Kruskal-Wallis rank test
the rank test statistic is basically an MSEbetween based on the ranks
rank all observations in the combined sample
let Rij denote the rank Xij (i =1, …, k, j =1, …, ni)
Kruskal-Wallis test statistic
average of the ranks Rij (j =1, …, ni) in the ith group
Part 4 – Statistical tests 94
95. > two independent samples Parametric statistics Kruskal-Wallis test
Non-parametric statistics
Categorical data – Proportions
Kruskal-Wallis rank test
when H0 is rejected → at least 2 means are different → pairwise comparisons
Wilcoxon rank sum statistic or Mann-Whitney statistic: alternative hypothesis
in terms of probabilities: HA : P(X1 > X2) …
Family-wise error rate – αE → we should correct for the risk of false detections,
Bonferroni correction: when m tests must be performed simultaneously, each
of the tests must be performed at α = αE / m
equivalent: multiply each p-value with m before interpreting
Part 4 – Statistical tests 95
96. ≥ two independent samples Parametric statistics Analysis of Covariance (ANCOVA)
controlling for covariate Non-parametric statistics
Categorical data – Proportions
Analysis of Covariance - ANCOVA
Adjustment for a confounder (e.g. age)
Just like in ANOVA we have a treatment effect (consider for example 3
treatments)
We add the variable age to our model → adjustment for a confounder
Part 4 – Statistical tests 96
97. ≥ two independent samples Parametric statistics Breslow-Day test
controlling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test
Categorical data – Proportions
Three-way contingency tables
In studying the effect of an explanatory variable X on a response variable Y,
one should control covariates that can influence that relationship
Example: Peginterferon alfa for hepatitis C
Virologic Response
Genotype Treatment Yes No
1 A 138 160
Conditional odds ratio θ1
B 103 182
2 A 106 34
Conditional odds ratio θ2
B 88 57
Total A 244 194
Marginal odds ratio
B 191 239
Part 4 – Statistical tests 97
98. ≥ two independent samples Parametric statistics Breslow-Day test
controlling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test
Categorical data – Proportions
Breslow-Day test for testing homogeneity of odds ratios
The odds ratio between X and Y is the same as in different Z categories. It is a
test of homogeneous association.
Part 4 – Statistical tests 98
99. ≥ two independent samples Parametric statistics Breslow-Day test
controlling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test
Categorical data – Proportions
Cochran-Mantel-Haenszel Test of conditional independence
Conditional XY independence given Z in a 2 × 2 × K table.
The response is conditionally independent of the treatment in any given strata
Inappropriate when the association varies dramatically among the partial
tables
Part 4 – Statistical tests 99
100. ≥ two independent samples Parametric statistics Breslow-Day test
controlling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test
Categorical data – Proportions
Cochran-Mantel-Haenszel Test of conditional independence
Example Colon cancer: ECOG PS-adjusted OR = 1.52 (95% CI, 0.98-2.36,
p=0.064 CMH test). Indicating that the response is independent of the
treatment in the different ECOP PS strata.
6. Bokemeyer et al, 2008: M&M and p 667 Efficacy
Response
ECOP PS Treatment Yes No
0 Cet. + FOL
Conditional odds ratio θ1
FOLFOX-4
1 Cet. + FOL
Conditional odds ratio θ2
FOLFOX-4
2 Cet. + FOL
Conditional odds ratio θ3
FOLFOX-4
Total Cet. + FOL 77 92
Marginal odds ratio = 1.51
FOLFOX-4 60 108
Part 4 – Statistical tests 100