Basics statistics

Introduction to statistics

Els Adriaens, PhD

December 17, 2010 1

Overview

Outline

Formulate a relevant research question

Study design

Gather the data according to the plan

Analyze the data

Explorative data-analyses (descriptives, graphically)

Drawing inference (answer our research question with certain
confidence)

Report the results

Overview 2

Experimental versus observation studies Experimental study
Design of an experimental study Observational study
Overview study designs Mixed experimental and observational studies

Part 1
Design of a study

Part 1 – Design of a study 3


Experimental study

Factor levels (treatments) randomly assigned over the different
experimental units (control over explanatory variable)
→ information about cause-and effect relationship between the
explanatory factors and a response variable

 Example: Effect of Vitamin C on prevention of colds in 800 children. Half of the children
were selected at random and received Vit C (treatment group) the remaining children
received a placebo (control group)

Qualitative explanatory factor with two levels and children as experimental units



Observational study

Data obtained from non-experimental study: explanatory variables not
controlled, randomization of the treatments to experimental units does not
occur
→ establish associations between the explanatory factors and a response
variable

 Example: Company officials wished to study the relation between the age of an employee
and the number of days of illness in a year.

Explanatory variable not controlled → age is observed

Establish associations but no cause-and-effect: a positive relation between age and
number of days of illness may not imply that number of days of illness is the direct result
of age → younger employees work indoors while older employees usually work outdoors,
and therefore work location is more responsible for the number of days of illness instead
of age



Mixed studies

 Example: a clinical trial performed in 3 hospital centers, at each center the effect of drug
on lowering blood cholesterol was investigated. Within each hospital center volunteers
were randomly assigned to one of the two treatments (drug / placebo)

Experimental factor: treatment (drug versus placebo)

Observational factor: hospital center, not randomly assigned since each
volunteer was assigned to the nearest hospital center


Experimental - observation studies Factors and treatments Measurements
Design of an experimental study Randomization
Overview study designs Sampling from a population

Structure of the experiment

Factor B
Level 1 Level 2 Level 3

Level 1 1 2 3
Factor A
Level 2 4 5 6

Experimental unit
Replicates = treatment repeated → estimate experimental error

2 levels of factor A x 3 levels of factor B = 6 treatments

experimental unit: smallest unit of experimental material to which a treatment can
be assigned, the experimental unit is determined by the method of randomization



Number of factors: initial stages of investigation → include many factors
(more than can possibly studied in a single experiment)

Cause-and-effect diagrams are often used to identify factors that could
affect the outcome → reduce number of factors

Example : 4 factors each 2 levels → 16 treatment combinations

Number of levels of each factor:

Qualitative factors
Quantitative factors: # levels reflect the type of trend expected by the
experimenter
• 2 levels ~ linear change in response: min – max of specified range
• 3 levels ~ quadratic trend
• > 4 levels ~ detailed examination shape of response curve desired

Range of factor is one of the most important design decisions



Measurements: precision versus accuracy

Precision of a variable: the degree to which a variable has nearly the same
value when measured several times. It is a function of random error (chance)
and is assessed as the reproducibility of repeated measurements.

Example: weigh the same person 3 times on an electronic balance and
obtain slightly different measurements – 67.5 kg, 67.4 kg and 67.6 kg

The more precise a measurement, the greater the statistical power at a given
sample size to estimate mean values and to test hypothesis

Variability may be due to operator, instrument and subject

Minimize random error and improve precision
Operating manuals, training the operator, refining / automating instruments
Repeat the measurement and average over a larger number of
observations (but! added cost, practical difficulties)



Accuracy of a variable: the degree to which a variable actually represents
what it is supposed to represent. It is a function of systematic error (bias)
which is often difficult to detect and has important influence on the validity of
the result.

Example 1: incorrect calibration of an instrument

Example 2: gastric freezing as a treatment for ulcers in the upper part of the
intestine

Improve accuracy and minimize bias
Operating manuals, training the operator, refining / automating instruments
Periodic calibration using a gold standard (example 1)
Blinding: double–blind study: the experimental subject and the evaluator
have no information on which treatment that they receive or give, any
inaccuracy in measuring the outcome will be the same in the 2 groups
(example 2)



Bias and variance in shooting arrows at a target. Bias means that the archer
systematically misses in the same direction. Variance means that the arrows
are scattered (Moore and McCabe 2002)

Part 3 – Statistical inference 11


Sampling from a population

Simple random sample

Population (N elements) Sample (n elements)

Random draws

With equal probability



Randomization → treatments are at random assigned to experimental units

Tends to eliminate the influence of extraneous factors not under direct
control of the experimenter

Blocking → increase precision by talking into account other factors
Randomization
Group 1 → treatment 1

Males Group 2 → treatment 2

Homogeneous Group 3 → treatment 3
Heterogeneous
Subjects

Randomization
Group 1 → treatment 1

Females Group 2 → treatment 2

Homogeneous Group 3 → treatment 3



Stratified Sampling

Suppose we want to know the attitudes of male and female students in the
engineering school

Is a simple random sample from that school a good idea?
No too few women (10%)

Stratify the sample, pick a random sample from

Stratum 1: female engineers
Stratum 2: male engineers

Estimates are measured with comparable precission. Learn from distribution in
each stratum, do NOT pool the data

e.g. if the average weight is 60kg for the women and 80 kg for the men,
The average engineer will weight 10% x 60 + 90% x 80 = 78 kg


Types of variables
Univariate descriptives
Bivariate descriptives

Part 2
Explorative data-analysis

Part 2 – Explorative data-analysis 15

Types of variables

Descriptive statistics

Allows the researcher to describe or summarize the data. This is typically
done in the beginning of a results section. The researcher gives an idea of the
sample size, the characteristics under study (e.g. baseline characteristics in a
clinical trial)

Example: A total of 235 students participated in this study, 163 women (69.4%)
versus 72 men (30.6%). On average the female students (81.3 ± 19.4) had a
slightly higher score on exam 2 in comparison to the male students (80.7 ±
18.1).


Types of variables

We typically start with univariate explorations (one variable at a time). Next,
describe joint distributions (2 by 2 = bivariate; more variables = multivariate)

Graphical summary to inspect the shape of the distribution: symmetry,
modality, heaviness of tails

Numerical summary: classical measures of location and spread

Mean and standard deviation
Median and interquartile range
Mode: value that occurs most often (useful for nominal data)


Types of variables

Notes on notation

A random variable X is a variable whose value is a numerical outcome of a
random phenomenon (nonnumerical outcomes are numerically encoded)

Random variables are usually denoted by capital letters such as X, Y, …

Fixed constants or observed values are usually denoted by small letters
e.g. x, y. Special constants (to be specified) will be written as Greek letters
α, β, μ, σ

indices i will subscript random or observed outcomes for individual
observations in the data set: Yi , yi


Types of variables

Type Characteristic Example Descriptive Information
statistic content

Categorical the set of all possible
values can be enumerated
• Nominal Unordered categories Gender, race Counts, Lower
proportions
• Ordinal Ordered categories Degree of pain Median Intermediate

Continuous can take all possible values Weight, Mean, Higher
or ordered within some interval of real number of standard
discrete numbers (continuous) or cigarettes per deviation
limited to integers (discrete) day


Types of variables Histogram – Boxplot Normal curve
Univariate descriptives Measures for location center
Bivariate descriptives Measures of spread

Mean of a series of observations xi, i = 1, 2, …, n

Properties given that X and Y are random variables and ‘a’ is a scalar

µ aX +b = aµ X + b = ax + b
µ X +Y = µY + µY = x + y

Median (M): middle of the distribution such that at least 50% of the outcomes
is larger than or equal to M and at least 50% of the outcomes is smaller than
or equal to M

For n uneven: this is the middle value in order of magnitude
For n even: one will take the average of the two middle values



Mean is very sensitive to outliers

Numbers of partners desired in the next 30 years
Miller and Fishkin, 1997



Standard deviation of a series of observed values xi

1 n
SD( x) =
n
∑i =1 ( xi − x) 2

When the variable is approximately normally distributed, approximately 95% of
the data will lie between x − 1.96 SD( x) and x + 1.96 SD( x)

Square of SD is called the Variance Var(x)

SD( x)
Variation coefficient 100%
x



Interquartile range (IQR): distance Q3 – Q1 with

Q1: a value such that at least 25% of the outcomes fall below Q1 and at
least 75% of the outcomes fall above Q1

Q3: a value such that at least 75% of the outcomes fall below Q3 and at
least 25% of the outcomes fall above Q3
If more than one value satisfies this criterion, the average is usually taken



Five number summary: Min, Q1, Median, Q3 Max

whiskers
reach to largest
observation within a
distance of 1.5 x IQR
1.5 x IQR
Birth weight

IQR quartiles

Median



Bar diagram for continuous data – relative or absolute frequencies

Percentage

Birth weight



Normal distribution
1  x−µ 
2

1 −  
Density φ ( x) = e 2 σ 

σ 2π

μ is the population mean

σ² is the population variance

Notation X ~ N(μ, σ²)
X −µ
If X ~ N(μ, σ²), then Z = ~ N(0, 1) is a standard normal distribution
σ



Properties of the standard normal distribution N(0, 1)

unimodal: 1 maximum (i.e. 0)

symmetric around 0

68-95-99.7 rule:

• 68% of the area under the curve (AUC) lies between -1 and 1, 68% of
the observations fall within 1 SD of the mean μ

• 95% of the AUC lies between -2 and 2, 95% of the observations fall
within 2 SD of the mean μ

• 99.7% of the AUC lies between -3 and 3, 99.7% of the observations fall
within 3 SD of the mean μ



Normal quantile plot

Compares two distributions by plotting their quantiles against each other

If the observed and the normal distribution are identical, points are expected to
lie on a straight line with intercept 0 and slope 1

Distributions with the same shape but simply rescaled or shifted still show up
on a straight line but with different intercept (shift) or slope (scale change)

Normal Q-Q plot of randomly generate data N(0, 1) randomly generated exponential data


Types of variables Continuous data
Univariate descriptives Categorical data

Bivariate relations – continuous data

Graphical: boxplots, (stacked) histrograms, scatter plots

Correlation coefficient (r):

Takes values between -1 and 1
Pearson correlation coefficient

expresses a degree of linear dependence
1 n  xi − x yi − y 
r = ∑ ×
n i =1  SD( x) SD( y ) 


! Summary statistic cannot r = 0.816
replace the individual
examination of the data

Source wikipedia – Anscombe’s Quartet


Bivariate relations - Spearman’s Rank correlation (-1 and 1)

Measures of monotone association (extent to which as one variable
increases, the other variable tends to increase or decrease)
No assumption on linearity
Ordinal variables

Source: Answers.com



Bivariate relations - Spearman’s Rank correlation (-1 and 1)

Corneal irregular astigmatism after
laser in situ keratomileusis for
myopia
Br J Ophthalmol 2001;85:534-536

X

Spearman rank correlation
http://geographyfieldwork.com/SpearmansRank.htm rs=0.440, p <0.0001



2x2 associations – categorical data: comparing two proportions

Many studies are designed to compare two groups (X) on a binary response
variable (Y) Y
X Success Failure
Group 1 π1 1-π1 π: probability of succes
Group 2 π2 1-π2 1-π: probability of failure

Example: is there an association between antiviral drug use (X) and pneumonia
(Y).
Pneumonia Pneumonia
Yes No Yes No
Antiviral drug 579 45172 45751 Antiviral drug 0.013 0.987 1
Control 648 45103 45751 Control 0.014 0.986 1



Risk difference: is there a difference between the group taking antiviral drug and
the control group

π1 – π2 = 0.013 – 0.014 = -0.001

Properties

-1 ≤ (π1 - π2) ≤ 1
if response is independent of group, then (π1 - π2) = 0

A difference may be more important when both success probabilities are close
to 0 or 1 than when both p’s are close to 0.5

Example (p1-p2) = 0.09 (0.1-0.01=0.09) or (0.50-0.41=0.09)

In the first case, p1 is 10 times larger than p2 while in the second case p1 is
only 1.2 times larger than p2.



Relative risk: ratio of the success probabilities of the 2 groups

Properties

0 ≤ (π1/ π2) ≥ 1
if response is independent of group, then (π1/ π2) = 1

Antiviral drug example

(p1/p2) = (.013/.0.14) = 0.894 with 95% CI: 0.799, 0.999
The sample proportion of pneumonia cases was 10.6% lower for the group
prescribed antiviral drug. The CI of the relative risk indicates that the risk
of pneumonia is at least 1% lower for the group prescribed antiviral drug.



Odds ratio

For a probability π of success, the odds are defined to be

Odds ≥ 0 with values > 1 when a success is more likely than a failure. For
example, if π = .75, then the odds of success = .75/.25 = 3.0: a success is
three times as likely as a failure. If Ω = 1/3, a failure is three times as likely as
a success.

The ratio of the odds Ω1 and Ω2 in the two rows is called the odds ratio

Properties odds ratio

0≤θ≥∞
When X and Y are independent, then θ = 1
the odds ratio does not change value when the orientation of the table
reverses (rows become columns, columns become rows)



Odds ratio - continued

Properties

if θ = 4, the odds of success in row 1 are 4 times the odds in row 2, and
thus subjects in row 1 are more likely to have success than are subjects in
row 2
θ = 4 does not mean that the probability π1 is four times π2 (that would be
the interpretation of relative risk)
the odds ratio does not change when both cell counts within any row (or
column, but not both) are multiplied by a nonzero constant; this implies
that the odds ratio does not depend on the marginal counts within a
row/column



Odds ratio - Example
Pneumonia
Sample odds ratio is computed by
Yes No
Antiviral drug 579 45172 45751
Control 648 45103 45751

For the patients prescribed antiviral drug, the estimated odds of pneumonia is
579/45751 = 0.013. There were 1.3% pneumonia cases for every 100 cases
with no pneumonia.

The sample odds ratio = 579*45103/648*45172 = 0.892. (95% CI: 0.797,
0.999). The estimated odds for patients prescribed antiviral drug equals 0.892
times the estimated odds for patients in the control group. The estimated odds
were 10.8% lower for the antiviral drug group.



Relation between odds ratio and relative risk

When the proportion of successes is close to 0 for both groups, the sample
odds ratio is similar to the sample relative risk. In such a case, on odds ratio of
0.89 does mean that the probability of success for the patients prescribed
antiviral drug is about 0.89 times the probability of success for the patients in
the control group

Relative risk = 0.894 (95% CI: 0.799, 0.999)
Odds ratio = 0.892 (95% CI: 0.797, 0.999)



What should be used, risk difference, relative risk or odds ratio

The odds ratio is the preferred estimate

In a case-control study it is usually not possible to estimate the probability of
an outcome given X (π1), and therefore it is also not possible to estimate the
difference of proportions or relative risk for that outcome

In a retrospective study, 709 patients with lung cancer (cases) were queried
about their smoking behavior (X). Each case was matched with a control
patients: same age, same gender, same hospital but no lung cancer

Odds ratio = 2.97 the estimated odds of lung cancer for smokers were
2.97 times the estimated odds for non-smokers

Lung cancer
Cases Controls
Smoker 688 650
Non-smoker 21 59
Total 709 709


Part 3
Statistical inference


Distributions
Bias and variance
Hypothesis testing

Statistical inference: by using the laws of probability, we infer conclusions
about a population from data collected in a random sample
Population (N elements)
Sample (n elements)

X Random sample
X Collect data
μ, σ SD(x)

Make inferences
about population

A parameter (μ, σ) is a number that describes the population. A
parameter is a fixed number, but its value is unkown in practice.
A statistic ( X , SD( x) ) is a number that describes the sample. Its value is
known when we have collected a sample, but it changes from sample to
sample.


Distributions Binomial distribution
Bias and variance Poisson distribution
Hypothesis testing Normal distribution

The sampling distribution of a statistic is the distribution of values taken by
the statistic in all possible samples of the same size from the same
population.

Binomial distribution

Poisson distribution

Normal distribution



Binomial distribution

Fixed number of n independent observations
Each observations falls in one of two categories (success/failure)
The probability of success ‘p’ is the same for each observation

→ denote X the number of successes among the n observations which
can take values 0, 1, …, n then X ~ B(n, p)

Properties
µ X = np
σ X = np(1 − p)
2

Probability mass function



Poisson distribution: expresses the number Y of events in a given unit of
time, space, volume, or any other dimension

Example → modeling a phenomenon in which we are waiting for an
occurrence (waiting for customers to arrive in a bank)

Basic assumption: for small time intervals, the probability of an occurrence
is proportional to the length of waiting time
Single parameter λ >0, the average number of events per unit of
measurement.

k = number of occurrences of an event
λ = expected number of occurrences that occur during the given interval

µY = λ
σY = λ
2



Normal distribution
1  x−µ 
2

1 −  
density φ ( x) = e 2 σ 
σ 2π

X1, X2, …, Xn is a simple random sample with mean μ and variance σ²

if Xi ~ N(μ, σ²) then X ~ N(μ, σ²/n)

Central limit theorem

Draw a simple random sample (X1,… , Xn) of size n from a population with
mean μ and finite variance σ². When n is large, the sample average then
follows approximately a normal distribution regardless of the data distribution.

 σ²
X ~ N  µ, 
 n 


Distributions Sampling variability
Bias and variance Standard deviation vs standard error
Hypothesis testing Confidence interval

Law of large numbers: population mean μ of X is unknown. The mean x of a
simple random sample → estimate of μ .
X is a random variable that varies in repeated sampling
guarantees that as the sample size of a simple random sample increases,
the sample mean x gets closer to the population mean μ

Unbiased statistic: a statistic used to estimate an unknown parameter is
unbiased if the mean of its sampling distribution is equal to the true value of
the parameter being estimated.

Variability of a statistic is described by the spread of its sampling
distribution.

Spread determined by sampling design and sample size. Larger samples
have smaller spread.



How precise is our estimate?

Sample Population

Generalize findings for general population
Estimate must approximate the population value

Representative sample
→ prevents the results for the sample from being biased
→ results are still subject to sampling variability: different samples from
the same population will yield different results

Generalizing results from the sample to the study population then requires
that we acknowledge sampling variability



Standard deviation ≠ standard error

Standard error measures the uncertainty in an estimate (standard error of
the mean = SEM) µ σ
n

Sampling distribution of the sample means X
Standard deviation (SD) of the observations → measures the variability in
the observations

both are standard deviations, but the standard error shrinks with increasing
sample size, in contrast to the standard deviation of the observations

The mean and SD are the preferred summary statistics for (normally
distributed) data, and the mean and 95% confidence interval are preferred for
reporting an estimate and its measure of precision.



Confidence intervals

When we estimate a parameter by calculating a sample statistic, there is a
degree of uncertainty in our estimation

We can construct an interval around the sample mean X within which we
expect the true population mean μ with known probability (e.g. 95% chance)

(1-α)100% confidence interval for the mean contains the population
mean with (1-α)100 % chance. Confidence level or coverage probability is
(1-α)

σ known σ unknown

 σ   s 
X ±z  X ±  t n −1,α / 2 × 
 n  n


Distributions Principle of statistical tests
Bias and variance p-value and power
Hypothesis testing one-sided versus two-sided testing

Hypothesis testing

The null hypothesis (Ho) assumes ‘no difference’ or ‘no effect’

The average … is equal in both treatment groups

The alternative hypothesis (HA) is claiming the opposite

The average … differs by treatment

Type of decision H0 true HA true

Accept H0
Correct decision (1-α) Type II error (β)
p>α
Reject H0
Type I error (α) Correct decision (1- β)
p<α

Power



We assume H0 is true unless we can demonstrate, based on sample data at
the desired level of confidence, that HA is true.

→ level of confidence related to 2 potential types of statistical errors

• example: in a clinical trial we want to study the effect of an experimental drug (T)
and compare it to a placebo (P)

H0 : effect of drug T = effect of P

HA : effect of drug T ≠ effect of P

Type I error (false positive): concern of the regulators, the drug is not
working but it will go to the market

Type II error (false negative): concern of pharmaceutical companies, could
not prove that the new drug is working



Sensitivity and specificity

Gold standard

Positive (ill) Negative (not-ill)

Test outcome False Positive (FP)
True Positive (TP)
→ Positive Type I error (P-value)

Test outcome False negative(FN)
True Negative (TN)
→ Negative Type II error

Sensitivity Specificity
Proportion ill Proportion non-ill
people identified people identified
as being ill non-ill



When are hypothesis needed

Hypothesis are not needed in descriptive studies

If any of the following terms appears in the research question (study not
simply descriptive) a hypothesis should be formulated: greater than, less than,
causes, leads to, compared with, more likely than, associated with, related to,
similar to, correlated with.

The hypothesis should be clearly stated in advance.



Principal of statistical testing

calculate a test statistic which measures ‘distance’ from the observed sample
to the null hypothesis, whose distribution is known under the null hypothesis

Reject Ho

test statistic t exceeds a chosen cut-off c (critical value) in magnitude
p-value stays below a chosen cut-off α in magnitude

safety principle: cut-off is chosen such that the risk of making a Type I error is
controlled at a prespecified significance level α

Usually α = 0.05 (test performed at the 5% significance level)

the power of the test (probability to avoid Type II errors, 1-β) is not controlled
→ chose adequate designs and sufficiently large sample sizes



critical value c: reject H0 when the test statistic t exceeds the chosen cut-off c
in magnitude

p-value: probability to find a result for the test statistic at least as extreme as
the observed result (in the direction of the alternative hypothesis), if the null
hypothesis holds
Acceptance region
α = 0.05

Rejection region Rejection region
α α
2 2

cL cR

Distribution of test statistic



Power: 1 − β = 1 − P (accept H0|HA) = P (reject HA|HA)

For many testing problems H0 is formulated very precisely, but there are
usually an infinite number of distributions consistent with HA.

σ
n

µ1 − µ 0 With what probability must the statistical test
Standardized effect size
σ detect this smallest relevant difference?
~ 91% chance of finding an association of that
size or greater


One-sided versus two sided testing

Two-sided testing One-sided testing

Decided prior to data analysis and avoid one-sided tests unless there are
really good reasons for using them (only one direction of the association is
clinically or biologically relevant)

never wrong to use a two-sided test where a one-sided test is applicable
at most a slight loss of power


Distributions
Bias and variance
Hypothesis testing

Multiple and Post Hoc Hypotheses - testing problem

Inflated rate of false positive conclusions (Type I error)

Assume we perform 3 independent comparison between 2 groups, each
conducted with α = 0.05
The probability that each of the tests → conclude H0 is correct in each case
= (0.95)³ =0.857
→ the chance of finding at least one false positive statistically significant test
increases to 14.3% (1-0.857=0.143, not 0.05)

Adjusting for multiple hypotheses is especially important when the
consequences of making a false positive error are large e.g. mistakenly
concluding that an ineffective treatment is beneficial

Adjustments can be made → False Discovery rate control


Part 4
Statistical tests

Part 4 – Statistical tests 59

Continuous/Categorical data Parametric statistics
Non-parametric statistics
Categorical data – Proportions

Continuous data

Parametric statistics


Categorical data

Ordinal versus nominal

Types of testing

One-sample tests
Two dependent groups
Two independent groups
More than two groups
Controlling for covariates


Continuous/Categorical data Parametric statistics

Dependent versus independent

Dependent Independent
Subject Time x Time y Subject Treatment Weight
Treatment A Treatment B
Volunteer 1 A x1A
Weight Weight
Volunteer 1 x1A x1B Volunteer 2 A x2A

Volunteer 2 x2A x2B Volunteer 3 A x3A
Volunteer 3 x3A x3B Volunteer 4 x4A
A
Volunteer 4 x4A x4B Volunteer 5 x5A
A
Volunteer 5 x5A x5B
Volunteer 6 B x6B

Volunteer 7 B x7B

Volunteer 8 B x8B

Volunteer 9 B x9B

Volunteer 10 B x10B


Continuous data Parametric statistics
Categorical data – Proportions Non-parametric statistics

Parametric statistics

assumes that the data come from a type of probability distribution and make
inferences about the parameters of the distribution

requires assumptions (e.g. Normal distribution), if they are correct they
produce more accurate and precise estimates and have generally more
statistical power

e.g. Independent sample t-test

Assumptions
• Independent observations
• Population 1 → X1i ~ N(μ1, σ²)
Population 2 → X2i ~ N(μ2, σ²)
H0 : μ1 = μ2 → H0 two distributions are equal


Continuous data Parametric statistics Rank tests
Categorical data – Proportions Non-parametric statistics Permutation tests

Non-parametric statistics – rank tests

no specific assumption about the population distribution required

Example: statistics based on Rank tests

Let X1, …, Xn denote a sample of n observations, the rank of observation Xj is
defined as
Rj = R(Xj) = number of observations in the sample < Xj
n
= ∑ I (X i ≤ X j )
i =1
The smallest observation gets rank 1, the second smallest rank 2, …, the
largest observation gets rank n.

In case of ties (a tie is a pair of equal observations), the ranks of the tied
observations are defined as the average of their ranks according to the
definition just given. These are called mid-ranks.



Example Observations Ranks
2 1
8 2
12 (3+4)/2
12 (3+4)/2
15 5
39 6

Properties of rank-transformed observations

they only depend on the ordering of the observations
they are insensitive to outliers (robust)
the distribution of the ranks does not depend on the distribution of the
observations



Non-parametric statistics – permutation tests

reference distribution of a characteristic of interest is obtained by calculating
all possible values of the test statistic under rearrangements of the labels on
the observed data points.
Example: a company has a new training program and whishes to evaluate if the
new method is better than the traditional one. To assess the effect of the new
method, they set up an experiment with 7 new employees. Four of them are
randomly assigned to the new training method, and the other three received the
old training method.
Observed data Rearrangement
New Traditional New Traditional Permutations
37 23 37 23
 7  7!
49 31 49 31 55
 = = 35
55 46 55 31 46  4  4!3!
57 57



Permutation tests

to verify whether there is a difference in means of a continuous measurement
in 2 independent populations

Permutation null distribution

H0 : F1(x) = F2(x) for all x.
HA : μ1 > μ2

Test statistic T = X1 − X 2

Example: we have 35 possible permutations (each having a t*-value), the
collection of all the t*-values is the permutation null distribution



Permutation test - example

Test statistic T = X1 − X 2 → t = 49.5 – 33.3 = 16.2

Permutation null distribution of the 35 possible permutations, under the null
hypothesis all t*-values are equally likely

H0 will be rejected for large T (T>c, critical value), c controls the type I error
rate at α P(T > c |H0) < α



Parametric versus non-parametric tests

Parametric tests: the data are sampled from a population with N-distribution
OR large sample size (CLT)

Smaller sample size: outliers or skewed distribution can be problematic →
transformation or non-parametric tests (permutation or rank tests)
Permutation tests: very flexible

Non-parametric rank tests: in case of no meaningful measurement scale (pain
score, Apgar score, …)

Careful with formulation of H0 and interpretation of the analysis
Less power



Categorical / discrete data: the set of all possible values can be enumerated

Ordinal data: ordered categories
Age group, pain assessment from no to severe, Likert scales (agree
strongly, agree, neutral, disagree, disagree strongly)

Nominal data: categories have no natural order, sometimes called
qualitative data (gender, race, hair color)

Counts: variables are represented by frequencies

Proportions / percentages

Ratio of counts e.g. binary or dichotomous data: have exactly two possible
outcomes (success / failure), we count the number of success in the
number of trials


One-sample tests Parametric statistics One-sample t-test
Categorical data - Proportions

One-sample t-test

to verify whether the mean of a continuous measurement deviates from a
given value μ0

H0 : μ = μ0
HA : μ ≠ μ0

Test statistic

t-distributed with n-1 degrees of freedom (df)

Assumptions

Independent observations
Normally distributed observations or large sample


One-sample tests Parametric statistics 1-way contingency tables

One categorical variable with J ≥ 2 categories

Example: number of students in each of the three main subjects in the 1st
master psychology (2003-2004)

Suppose that in the population, the true proportions are:

Part 6 – Categorical data 71

One-sample tests Parametric statistics 1-way contingency tables

X² test One categorical variable with J ≥ 2 categories

Statistic

H0 : pj = πj for all j or for frequencies nj = μj
HA : pj ≠ πj
Statistic

Example, df = J − 1 = 2 and P < .0001, strongly suggesting that the null
hypothesis should be rejected.


Two dependent samples Parametric statistics Paired sample t-test

Paired sample t-test

to verify whether 2 continuous measurements, obtained from paired subjects,
are the same on average

H0 : μ1 = μ2
HA : μ1 ≠ μ2

→ calculate differences Y = X1 – X2 and use the one-sample t-test to verify
whether H0 : μ = 0 versus HA : μ ≠ 0, where μ is the average of Y

Assumptions

Independent differences
Normally distributed differences or large sample (n ≥ 40)
n ≥ 15 t-test fine unless very skewed distribution or outliers
n < 15 data ~ N-distr, very skewed distribution or outliers problematic

Part 4 – Statistical tests Source assumptions ‘Introduction to the practice of statistics, Moore & McCabe’ 73

Two dependent samples Parametric statistics Wilcoxon signed rank test

Wilcoxon signed rank test

Compare 2 dependent samples → the difference variable Y = X1 - X2

Whit Yi + observations on the positive differences (i = 1, …, n+) and Yi -

observations on the negative differences (i = 1, …, n-) then
H0 : P(Y - < Y +) = ½
HA : P(Y - < Y +) > ½

Statistic


Two dependent samples Parametric statistics Wilcoxon signed rank test

Wilcoxon signed rank test - Example

Two stories ware narrated to children with reading disorders, story 1 was not
illustrated whereas story 2 was illustrated

Child 1 2 3 4 5
Story 1 0.40 0.72 0.00 0.36 0.55
Story 2 0.77 0.49 0.66 0.28 0.38
Difference (Yi ) 0.37 -0.23 0.66 -0.08 -0.17
ranks of |Yi | 4 3 5 1 2
signed ranks 4 -3 5 -1 -2 V=9
V= 9, n=5, p=0.406

From this small sample we could not conclude that children with reading disorders
can tell a story better when the story was illustrated.


Two dependent samples Parametric statistics Models for matched pairs

Models for matched pairs

For comparing categorical responses for 2 samples when each sample has
the same subject or when a natural pairing exists between each subject in one
sample and a subject from the other sample.

McNemar test compares proportions in paired studies

H0 : π1+ = π+1 After Total
Before Yes No
HA : π1+ ≠ π+1
Yes n11 n12 n1+
No n21 n22 n2+
Total n+1 n+2 n


Two independent samples Parametric statistics Independent sample t-test

Independent sample t-test

to verify whether the mean of a continuous measurement is the same in 2
independent populations

H0 : μ1 = μ2 versus HA : μ1 ≠ μ2

Test statistic

Measurement variance = in the 2 groups

Measurement variance ≠ in the 2 groups t*

Assumptions

Normally distributed observations or large sample in each group
Small but equal sample size n1 = n2 = 5 and shape of distributions
comparable → we can still trust on t-test procedures

Two independent samples Parametric statistics Independent sample t-test

Independent sample t-test – continued

Measurement variance = in the 2 groups, SE of the mean difference can be
estimated as

With

Measurement variance ≠ in the 2 groups, SE of the mean difference can be
estimated as

(1-α)100% confidence interval for μ1 - μ2 versus


Two independent samples Parametric statistics Rank tests
Non-parametric statistics Mann-Whitney U, Wilcoxon Rank Sum

Mann-Whitney (U) test, Wilcoxon rank-sum test

Compare 2 independent samples

H0 : F1(x) = F2(x) for all x
HA : P(X1 < X2) ≠ ½

where X1 and X2 have distributions F1 and F2, respectively.

If X1 and X2 are continuous random variables, the test may be thought of
as testing the null hypothesis that the probability of an observation from
one population exceeding an observation from the second population is
0.5, this implies

P(X1 < X2) = P(X1 > X2) = ½

→ test statistics based on this principle


Two independent samples Parametric statistics Rank tests
Non-parametric statistics Mann-Whitney U, Wilcoxon Rank Sum

Is the Wilcoxon rank-sum test the nonparametric alternative for the
independent-sample t-test?

Remember
H0 : F1(x) = F2(x) for all x (2 distributions are equal)
HA : P(X1 < X2) ≠ ½

→ the ranks cannot be used to estimate the mean!

Independent sample t-test
H0 : μ1 = μ2
HA : μ1 ≠ μ2


Two independent samples Parametric statistics 2X2 contingency tables

2x2 contingency tables

Example: Patient characteristics at the onset of first-line treatment with
gefitinib or chemotherapy

Frequency Conditional distribution of ECOP PS
status given treatment

ECOG PS Total ECOG PS Total
Treatm <2 ≥2 Treatm <2 ≥2
Gefinitib 70 17 87 Gefinitib 0.805 0.195 1.00
Chemo 57 4 61 Chemo 0.934 0.066 1.00
Total 127 21 Total

Two variables are said to be statistically independent if the conditional
distributions of Y (Eastern Cooperative Oncology Performance status) are
identical at each level of X (treatment)



Testing independence - Pearson chi-square test

H0 : πij = πi+ π+j for all i and j or for frequencies nj = μj
HA : πij ≠ πi+ π+j
Statistic

Example

Χ² = 4.964, df=1, ECOG PS status and treatment are significantly associated,
The proportion of patients with a poor ECOG performance status (≥ 2) was
higher in the first-line gefitinib group (20%) than in the first-line chemotherapy
group (7%; P = 0.026).



Testing independence – Fisher’s exact test

For small samples, Fisher’s exact test: assumes that the row and margin totals
are fixed (hypergeometric distribution). When this assumption is not met (most
cases), Fisher’s exact test is very conservative, resulting in a type I error below
0.05.

H0 : θ = 1
HA : θ ≠ 1

Treatm Adeno Nonadeno Total
Gefinitib 85 2 87
Two-sided p-values:
Chemo 58 3 61 Fisher’s exact test p = 0.403
Total 142 5 673 Chi-square test p=0.385



Large samples

In case of very large sample sizes pearson chi-square will reject almost any
null hypothesis, even if the deviation of the observed from the expected counts
is of little importance → use the Gini index (value equals the proportion of
observations that would have to be moved from one cell to another in order for
the observed counts to equal the expected counts

Small samples

Inferences based on chi-square distribution become questionable when the
expected counts in some cells become too small (below 5) even when the
total sample size is large → use exact solutions (Fishers Exact test)


≥ two independent samples Parametric statistics Analysis of Variance

One-way analysis of variance (ANOVA)

to verify whether the mean of a continuous measurement is the same in 2 or
more independent populations

H0 : μ1 = μ2 = … = μk versus

HA : at least 1 of the population means differs
Between MSE H0
Test statistic F = ~ F
k −1, n − k
Within MSE

Assumptions
Normally distributed observations or large sample within each group (Q-Q
plots)
Equal variance in each group (boxplots or Levene’s test)



ANOVA principle

Is variation between groups large as compared to variation within groups

Consider k groups with each ni observations with jth observation in ith group

k ni k ni k ni

∑∑ (Yij − Y ) 2 = ∑∑ (Yij − Yi ) 2 + ∑∑ (Yi − Y ) 2
i =1 j =1 i =1 j =1 i =1 j =1

Total Sum of Squares = within SS + between SS



ANOVA Table

Source Sum of Squares df Mean Squared Error F
SS MSE
k ni
Between ∑∑ (Y i − Y ) 2
i =1 j =1
k-1 SS B
k −1
MSEB
MSEW
k ni
Within ∑∑ (Y
i =1 j =1
ij − Yi ) 2 n-k SSW
n−k
k ni
Total ∑∑ (Y
i =1 j =1
ij − Y )2



Deviations from the assumptions

one-way analysis of variance is robust against lack of normality
→ in case of important deviations from a normal distribution : use
nonparametric Kruskal-Wallis test or transformations

ANOVA is not sensitive to the assumption of homogeneity of variances
(perform Levene’s test at the 1% sigificance level)
→ heterogeneity of variances
• little impact when the group level sample sizes ≈ equal: Type I error rate
is slightly increased
• with important heterogeneity and markedly ≠ group level sample sizes,
weighted least squares regression may be used, weighting each
observation by the inverse group level standard deviation



Post-hoc analysis

if ANOVA detects no difference, we conclude that there is insufficient evidence
of a difference in means

if ANOVA detects a difference → post hoc analysis to investigate where the
-
difference is

DO NOT perform all pairwise comparisons using independent samples t-tests
→ multiple testing problem

Assume we perform 3 different t-test, each conducted with α = 0.05
The probability that each of the tests → conclude H0 is correct in each
case = (0.95)³ =0.857 (assuming independence of tests)
→ the level of sign that at least one of the three tests leads to conclusion
HA when H0 holds in each case would be 1-0.857=0.143 (not 0.05).

The level of significance and power for a family of tests ≠ individual test



Family-wise error rate - αE

The probability of making at least 1 false discovery (type I errors) among all
the hypotheses when performing multiple pairwise tests

→ We should correct for the risk of false detections

most procedures for multiple testing are designed to control the risk of at least
1 false detection at αE, assuming that all k null hypotheses are true

when the k tests are independent, each with significance level α, then

αE = P(at least 1 Type I error) = 1 − (1 − α)k ≈ k α
family-wise error rate increases with the number of tests



Multiple comparison procedures that control family-wise error rate

Bonferroni procedure

Conservative test: makes less Type I errors than allowed for (and thus
more Type II errors)
Only applicable when the effects to be investigated are identified in
advance of the data analysis

Tukey procedure

Preferred method when only pairwise comparisons are to be made

Scheffé procedure

Preferred method when the family of interest is a set of all possible
contrasts among the factor level means



Rules of thumb

never interpret a large p-value as indicating absence of association

never interpret a small p-value as indicating an important association

report p-values in combination with an effect estimate and confidence interval!
This allows for judging whether the effect is practically significant.

in some cases, it may be advisable to determine equivalence intervals prior to
data analysis


> two independent samples Parametric statistics Kruskal-Wallis test

Kruskal-Wallis rank test

k-sample problem, compare more than 2 independent samples

H0 : F1(x) = F2(x) = … = Fk(x) for all x
HA : P(X1 < X2) ≠ ½ the observations in some populations are
systematically larger than in other populations

Assumptions

the observations in each group come from populations with the same
shape of distribution




the rank test statistic is basically an MSEbetween based on the ranks

rank all observations in the combined sample
let Rij denote the rank Xij (i =1, …, k, j =1, …, ni)
Kruskal-Wallis test statistic

average of the ranks Rij (j =1, …, ni) in the ith group




when H0 is rejected → at least 2 means are different → pairwise comparisons
Wilcoxon rank sum statistic or Mann-Whitney statistic: alternative hypothesis
in terms of probabilities: HA : P(X1 > X2) …

Family-wise error rate – αE → we should correct for the risk of false detections,
Bonferroni correction: when m tests must be performed simultaneously, each
of the tests must be performed at α = αE / m

equivalent: multiply each p-value with m before interpreting


≥ two independent samples Parametric statistics Analysis of Covariance (ANCOVA)
controlling for covariate Non-parametric statistics

Analysis of Covariance - ANCOVA

Adjustment for a confounder (e.g. age)

Just like in ANOVA we have a treatment effect (consider for example 3
treatments)
We add the variable age to our model → adjustment for a confounder


≥ two independent samples Parametric statistics Breslow-Day test
controlling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test

Three-way contingency tables

In studying the effect of an explanatory variable X on a response variable Y,
one should control covariates that can influence that relationship

Example: Peginterferon alfa for hepatitis C

Virologic Response

Genotype Treatment Yes No
1 A 138 160
Conditional odds ratio θ1
B 103 182
2 A 106 34
B 88 57
Total A 244 194
Marginal odds ratio
B 191 239



Breslow-Day test for testing homogeneity of odds ratios

The odds ratio between X and Y is the same as in different Z categories. It is a
test of homogeneous association.



Cochran-Mantel-Haenszel Test of conditional independence

Conditional XY independence given Z in a 2 × 2 × K table.

The response is conditionally independent of the treatment in any given strata

Inappropriate when the association varies dramatically among the partial
tables



Cochran-Mantel-Haenszel Test of conditional independence

Example Colon cancer: ECOG PS-adjusted OR = 1.52 (95% CI, 0.98-2.36,
p=0.064 CMH test). Indicating that the response is independent of the
treatment in the different ECOP PS strata.
6. Bokemeyer et al, 2008: M&M and p 667 Efficacy

Response

ECOP PS Treatment Yes No
0 Cet. + FOL
FOLFOX-4
1 Cet. + FOL
FOLFOX-4
2 Cet. + FOL
FOLFOX-4
Total Cet. + FOL 77 92
Marginal odds ratio = 1.51
FOLFOX-4 60 108

Basics statistics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Basics statistics

Similar to Basics statistics (20)

More from BITS

More from BITS (20)

Recently uploaded

Recently uploaded (20)

Basics statistics