This document discusses sample size calculations for different types of cohort studies. It provides examples of calculating sample sizes for studies measuring one variable, differences between two means, rates, or proportions. The key factors considered are the confidence interval, power, estimated outcomes in exposed and unexposed groups, and standard deviation or error terms. Sample size formulas are provided for prospective and retrospective cohort studies comparing outcomes within or between groups.
2. DEFINITION
Cohort studies are a type of medical research used to
investigate the causes of disease and to establish links
between risk factors and health outcomes.
The word cohort means a group of people. These types of
studies look at groups of people. They can be forward-
looking (prospective) or backward-looking
(retrospective).
3. • Prospective” studies are planned in
advance and carried out over a future
period of time.
• These long-term studies are sometimes
called longitudinal studies.
4. The four values required for a sample size calculation are
• Confidence Interval level–most individuals would
choose a 95% confidence interval, but a different
confidence interval could be entered.
• Power –most individuals choose a power value of 80%
or 90%, however, any power level can be entered.
• Ratio of Unexposed to Exposed in sample–place the
desired ratio of unexposed individuals to exposed
individuals.
If there are to be an equal number of unexposed and
exposed, then enter the value of 1.0; if there are to be
twice as many unexposed as exposed, enter the value
of 2.0.Any other ratio can be entered.
5. • Percent of Unexposed with Outcome–enter
an estimate of the percentage of unexposed
individuals that will develop (or have) the
outcome of interest.
For example, in a randomized control trial, you
would estimate the percentage of those in
the comparison group that will develop the
outcome of interest during the trial. In a
cohort study, enter the percentage of
unexposed individuals who will develop the
outcome of interest during the study.
6. SAMPLE SIZE CALCULATION FOR MEASURING
ONE VARIABLE
n = s2/e2
Small letters in the formula used below
represent the following :
• n- sample size
• e- required size of standard error
• s- standard deviation
7. Single mean
Example 10 : calculate the sample size for
conducting the study to determine the
mean weight of newborn babies .The mean
weight is expected to be 2800 grams
.weights are approximately normally
distributed and 95% of the birth weights
are probably between 2000 and 3600 grams
; therefore the standard deviation would be
400 grams . The desired 95% confidence
interval is 2850 to 3000 grams , so the
standard error would be 20 grams .
n = s2/e2 = 4002 / 202 = 160000 / 400
= 400 new born babies.
8. Single rate
Example 11 : the material mortality rate in a
country is expected to be 70 per 10,000 live
births . A Survey is planned to determine the
maternal mortality rate with a 95%
confidence interval of 60 to 80 per 10,000
live births .The standard error would
therefore be 5/10,000.The required sample
size would be :
n = r/e2 = 70/10000 ÷ (5/10000)2 = 28000 live
births
9. single proportion
Example 12 : The proportion of nurses leaving
the health services within three years of
graduation is estimated to be 30% .A study
that aims to find causes for this , also aims
to determine the percentage leaving the
service with a confidence interval of 25%
to 35 %. The standard error would
therefore be 2.5% . The required sample
size would be :
• n= P(100-P)e2 = 30*70/2.52 = 384 nurses
10. Difference between two means ( sample size
in each group)
Example 13 : A Study is being planned to find out
the difference of the mean birth weights in
district A and B. In district A the mean is expected
to be 3000 grams with a standard deviation of 500
grams .The difference in mean birth weight
between districts A and B is therefore expected
to be 200 grams .The desired 95% confidence
interval of this difference is 100 to 300 grams ,
giving a standard error of the difference of 50
grams .The required sample size would be :
n= s1
2 + s2
2/ e2 = 5002+5002/502
= 200 new born in each district
11. Difference between two Rates ( sample size
in each group)
Example 14 : The difference in material mortality
rates between urban and rural areas will be
determined . In the rural areas the maternal
mortality rate is expected to be 100per10,000
and in the urban areas 50per 10000 live
births . The difference is therefore 50per
10,000 live births .The desired 95%
confidence interval is 30 to70 per 10,000 live
births giving a standard error of the difference
of 10/10,000.The required sample size would
be :
12. n = (r1 + r2)
e2r
= 100 / 10,000 + 50/10,000
( 10 / 10,000) 2
= 15,000 live births in each area
13. Difference between two proportions ( sample
size in each group)
• Example 15 : The difference in the proportion of nurses
leaving the service is determined between two regions , In
one region 30% of the nurses are estimated to leave the s
within three years of graduation . in other region 15% giving
difference of 15% .The desired 95% confidence interval for
this difference is 5% to 25% , giving a standard error of
5%. The sample size in each group would be
• n = P1(100-P1)+P2(100-P2)
e2
= 30*70 +15*85
52
= 135 nurses in each region
14. Comparison of two Rates ( sample size
in each group)
Example 17: The difference in material mortality rates between urban
and rural areas will be determined . In the rural areas the maternal
mortality rate is expected to be 100per10,000 and in the urban areas
50per 10000 live births in the urban areas 50per 10000 live births
The required sample size show ( with a likelihood of 90%) a
significant difference between the maternal mortality in the urban
and rural areas would be ;
(u +v)2 (r1 + r2)
n =
( r1 + r2)2
= (1.28 + 1.96)2 (100 / 10,000 + 50/10,000)
(100 / 10,000 - 50/10,000)2
= 6299 live births in each area
15. Comparison of two proportions
( sample size in each group)
Example 18 : The proportions of nurses leaving the health service
is compared between two regions . In one region 30% of
nurses are estimated to leave the service within three years
of graduation , In other region it is probably by 15%.
The required sample size show with 90% a likelihood that the
percentage of nurses is different in these two regions would
be
(u +v)2 {P1(100-P1)+P2(100-P2)}
n=
(P1- P2)2
n = (1.28 + 1.96)2 (30*70+15+85)
(30-15)2
= 157 nurses in each group