Biostatics ppt

PRESENTED BY,
Dr. Sushi Kadanakuppe
II year PG student
Dept of Preventive & Community Dentistry
Oxford Dental College & Hospital

BIOSTATISTICS
 INTRODUCTION
 BASIC CONCEPTS
Data
Distributions
 DESCRIPTIVE STATISTICS
Displaying data
Frequency distribution tables.
Graphs or pictorial presentation of data.
Tables.
Numerical summary of data
Measure of central tendency
Measure of dispersion.

 ANALYTICAL OR INFERENTIAL
STATISTICS
 The nature and purpose of statistical inference
 The process of testing hypothesis
a. False-positive & false-negative errors.
b. The null hypothesis & alternative hypothesis
c. The alpha level & p value
d. Variation in individual observations and in
multiple samples.

 Tests of statistical significance
 Choosing an appropriate statistical test
 Making inferences from continuous
(parametric) data.
 Making inferences from ordinal data.
 Making inferences from dichotomous and
nominal (nonparametric) data.
 CONCLUSION

The worker with human material will find the
statistical method of great value and will have
even more need for it than will the laboratory
worker.

Claude Bernard (1927)1,
a French physiologist of
the nineteenth century and a pioneer in laboratory
research, writes: “We compile statistics only when
we cannot possibly help it. Statistics yield
probability, never certainty- and can bring forth
only conjectural sciences.”

The worker with human material, however, can
seldom control environment, nor can bring about
drastic changes in his subjects quickly,
particularly if he is studying chronic disease.

The variability of human material, plus the fact
that time allows the introduction of many
additional factors which may contribute to a
disease process, leaves the worker with
quantitative data affected by a multiplicity of
factors.

Statistical methods becomes necessary, probability
becomes of great interest, and conjecture based
upon statistical probability may show a way to
break the chain of causation of a disease even
before all factors entering into the production of
the disease are clearly understood.

Yule (1950)2
has defined statistics as “methods
specially adapted to the elucidation of quantitative
data affected by a multiplicity of causes”.

Fully half the work in the biostatistics involves
common sense in the selection and interpretation
of data. The magic of numbers is no substitute.
Bernard points with derision at a German author
who measured the salivary output of one sub
maxillary and one parotid gland in a dog for one
hour1
.

This author then proceeded to deduce the output of
all salivary glands, right and left, and finally the
output of saliva of a man per kilogram per day.
The result, of course, was a very top-heavy
structure built upon a set of observations entirely
too small for the purpose.
Work of this sort explains the jibes which so often
ricochet upon better statisticians. Such mistakes
can be avoided.

Statisticians also suffer because they are so often
content merely to collect and analyze data as an
end in itself without the purpose or hope of
producing new knowledge or a new concept.
Conant (1947), in his book On Understanding
Science, makes it very clear that new concepts
must alternate with the collection of data if an
advance in our knowledge is to occur3
.

DEFINITION
Statistics is a scientific field that deals with the
collection, classification, description, analysis,
interpretation, and presentation of data4
.
• Descriptive statistics
• Analytical statistics
• Vital statistics

a. Descriptive statistics concerns the summary
measures of data for a sample of a population.
b. Analytical statistics concerns the use of data
from a sample of a population to make
inferences about the population.
c. Vital statistics is the ongoing collection by
government agencies of data relating to events
such as births, deaths, marriages, divorces and
health- and –disease related conditions deemed
reportable by local health authorities.

USES
Biostatistics is a powerful ally in the quest for
the truth that infuses a set of data and waits to be
told.
• Statistics is a scientific method that uses theory
and probability to aid in the evaluation and
interpretation of measurements and data
obtained by other methods.

b. Statistics provides a powerful reinforcement for
other determinants of scientific causality.
c. Statistical reasoning, albeit unintentional or
subconscious, is involved in all scientific clinical
judgments, especially with preventive
medicine/dentistry and clinical
medicine/dentistry becoming increasingly
quantitative.

DATA
Definition: Data are the basic building blocks of
statistics and refers to the individual values
presented, measured, or observed.

a. Population vs sample. Data can be derived
from a total population or a sample.
1. A population is the universe of units or values
being studied. It can consist of individuals,
objects, events, observations, or any other
grouping.
2. A sample is a selected part of a population.

The following are some of the common types of
samples:
a) Simple random sample
b) Systematic selected sample
c) Stratified selected sample
d) Cluster selected sample
e) Nonrandomly selected, or convenience
sample.

b. Ungrouped vs grouped
1. Ungrouped data are presented or observed individually.
An example of ungrouped data is the following list of
weights (in pounds) for six men: 140, 150, 150, 150,
160, and 160.
2. Grouped data are presented in groups consisting of
identical data by frequency.
An example of grouped data is the following list of
weights for the six men noted above: 140 lb (one man),
150 lb (three men), and 160 lb (two men).

c. Quantitative vs qualitative
1. Quantitative data are numerical, or based on
numbers.
An example of quantitative data is height measured
in inches.
2. Qualitative data are nonnumerical, or based on
a categorical scale.
An example of qualitative data is height measured
in terms of short, medium, and tall.

d. Discrete vs continuous
1.Discrete data or categorical data are data for
which distinct categories and a limited number of
possible values exist.
An example of discrete data is the number of
children in a family, that is, two or three children,
but not 2.5 children.
All qualitative data are discrete.

Categorical data are further classified into
two types:
• nominal scale
• ordinal scale.

Nominal scale:
A variable measured on a nominal scale is
characterized by named categories having no
particular order.
For example,
 patient gender (male/female),
 reason for dental visit (checkup, routine
treatment, emergency), and
 use of fluoridated water (yes/no) are all
categorical variables measured on a nominal scale.

Within each of these scales, an individual subject
may belong to only one level, and one level does
not mean something greater than any other level.

Ordinal scale
Ordinal scale data are variables whose categories
possess a meaningful order.
For example,
 Severity of periodontal disease (0=none, 1=mild,
2=moderate, 3=severe) and
 Length of time spent in a dental office waiting
room (1= less than 15 min, 2= 15 to less than 30
minutes, 3= 30 minutes or more) are variables
measured on ordinal scales.

2. Continuous data or measurement data are data
for which there are an unlimited number of
possible values.
An example of continuous data is an individual’s
weight, which may actually be 159.232872…lb
but is reported as 159 lb.

• Measurement data can be characterized by
interval scale
ratio scale
• If the continuous scale has a true 0 point, the
variables derived from it can be called ratio
variables. The Kelvin temperature scale is a ratio
scale, because 0 degrees on this scale is absolute 0.

• The centigrade temperature scale is a continuous
scale but not a ratio scale, because 0 degrees on
this scale does not mean the absence of heat. So
this becomes an example of an interval scale, as
zero is only a reference point.

e. The quality of measured data is defined in
terms of the data’s accuracy, validity, precision,
and reliability.
1. Accuracy refers to the extent that the
measurement measures the true value of what is
under study.
2. Validity refers to the extent that the measurement
measures what it is supposed to measure.

3. Precision refers to the extent that the
measurement is detailed.
4. Reliability refers to the extent that the
measurement is stable and dependable.

Dental health professionals have a variety of
uses for data5
:
• For designing a health care program or facility
• For evaluating the effectiveness of an oral hygiene
education program
• For determining the treatment needs of a specific
population
• For proper interpretation of the scientific
literature.

DISTRIBUTIONS
Definition. A distribution is a complete summary of
frequencies or proportions of a characteristic for a series of
data from a sample or population.
Types of distributions
• Binomial distribution
• Uniform distribution
• Skewed distribution
• Normal distribution
• Log-normal distribution
• Poisson distribution

a. Binomial distribution is a distribution of
possible outcomes from a series of data
characterized by two mutually exclusive
categories.
b. Uniform distribution, also called rectangular
distribution, is a distribution in which all events
occur with equal frequency.

c. Skewed distribution is a distribution that is
asymmetric.
1. A skewed distribution with a tail among the
lower values being characterized is skewed to the
left, or negatively skewed.
2. A skewed distribution with a tail among the
higher values being characterized is skewed to
the right, or positively skewed.

d. Normal distribution, also called Gaussian
distribution, is a continuous, symmetric, bell-
shaped distribution and can be defined by a
number of measures.
e. Log-normal distribution is a skewed distribution
when graphed using an arithmetic scale but a
normal distribution when graphed using a
logarithmic scale.
f. Poisson distribution is used to describe the
occurrence of rare events in a large population.

Normal distribution Skewed distribution

Descriptive statistical techniques enable the
researchers to numerically describe and
summarize a set of data.

Data can be displayed by the following ways:
Tables.
Numerical summary of data
Measure of central tendency
Measure of dispersion.

I DISPLAYING DATA
Data can be displayed by the following ways:
Tables.

Frequency Distribution Tables
To better explain the data that have been collected,
the data values are often organized and presented
in a table termed a frequency distribution table.
This type of data display shows each value that
occurs in the data set and how often each value
occurs.

In addition to providing the sense of the shape of a
variable’s distribution, these displays provide the
researcher with an opportunity to screen the data
values for incorrect or impossible values, a first
step in the process known as “cleaning the data”5

• The data values are first arranged in order from
lowest to highest value (an array).
• The frequency with which each value occurs is
then tabulated.

• The frequency of occurrence for each data point is
expressed in four ways:
1. The actual count or frequency
2. The relative frequency (percent of the total
number of values).
3. Cumulative frequency (total number of
observations equal to or less than the value)
4. Cumulative relative frequency (the percent of
observations equal to or less than the value)
commonly referred to as percentile.

Exam
Scores Frequency % cumulative frequency cumulative %
56 1 3.0 1 3.0
57 1 3.0 2 6.1
63 1 3.0 3 9.1
65 2 6.1 3 15.2
66 1 3.0 3 18.2
68 2 6.1 5 24.2
69 2 6.1 6 30.3
70 2 3.0 8 36.4
71 1 3.0 10 42.4
72 1 6.1 11 45.5
74 2 3.0 12 48.5
75 1 3.0 14 54.5
76 3 6.1 15 63.6
77 2 9.1 16 69.7
78 1 6.1 18 72.7
79 1 3.0 21 75.8
80 2 3.0 23 84.8
81 3 3.0 24 87.9
Frequency Distribution Table for exam scores

• Instead of displaying each individual value in a
data set, the frequency distribution for a variable
can group values of the variable into consecutive
intervals.
• Then the number of observations belonging to an
interval is counted.

Exam scores Number of students %
56-61 2 6
62-65 3 9
66-69 5 15
70-73 4 12
74-77 7 21
78-81 7 21
82-85 3 9
86-89 2 6
Grouped frequency distribution of exam scores

Although the data are condensed in a useful
fashion, some information is lost.
The frequency of occurrence of an individual data
point cannot be obtained from a grouped
frequency distribution.
For example, in the above presentation of data,
seven students scored between 74 and 77, but the
number of students who scored 75 is not shown
here.

Graphic or pictorial presentation of data
Graphic or pictorial presentations of data are useful
in simplifying the presentation and enhancing the
comprehension of data.
All graphs, figures, and other pictures should have
clearly stated and informative titles, and all axes
and keys should be clearly labeled, including the
appropriate units of measurement.

Visual aids can take many forms; some basic
methods of presenting data are described below.
1. Pie chart
A pie chart is a pictorial representation of the
proportional divisions of a sample or population,
with the divisions represented as parts of a whole
circle.

cervical caries
Occlusal caries
Root caries
Dental caries in xerostomia patients
39% 42%
19%

2. Venn diagram
A Venn diagram shows the degrees of overlap and
exclusivity for two or more characteristics or factors
within a sample or population (in which case each
characteristic is represented by a whole circle)
or
for a characteristic or factor among two or more samples
or populations (in which case each sample or population
is represented by a whole circle).

The sizes of the circles (or other symbols) need not
be equal and may represent the relative size for
each factor or population.

3. Bar diagram
 A bar diagram is a tool for comparing categories
of mutually exclusive discrete data.
 The different categories are indicated on one
axis, the frequency of data in each category is
indicated on the other axis, and the lengths of
the bars compare the categories.
 Because the data categories are discrete, the bars
can be arranged in any order with spaces
between them.

Dental caries in Xerostomia Patients
0
20
40
60
80
cervical caries Occlusal
caries
Rootcaries
Series1

4. Histogram
A histogram is a special form of bar diagram that
represents categories of continuous and ordered
data.
The data are adjacent to each other on the x-axis
(abscissa), and there is no intervening space. The
frequency of data in each category is depicted on
the y-axis (ordinate), and the width of the bar
represents the interval of each category.

0
10
20
30
40
50
No of Subjects
5 to 10 years
10 to 15
15 to 20
20 to 25
25 to 30
Histogram of age for xerostomia subjects

5. Epidemic curve
An epidemic curve is a histogram that depicts the
time course of an illness, disease, abnormality,
or condition in a defined population and in a
specified location and time period.
The time intervals are indicated on the x-axis,
and the number of cases during each time
interval is indicated on the y-axis.

An epidemic curve can help an investigator
determine such outbreak characteristics as the
peak of disease occurrence (mode), a possible
incubation or latency period, and the type of
disease propagation.

6. Frequency polygon
A frequency polygon is a representation of the
distribution of categories of continuous and
ordered data and, in this respect, is similar to a
histogram.
The x-axis depicts the categories of data, and the
y-axis depicts the frequency of data in each
category.

In a frequency polygon, however, the frequency is
plotted against the midpoint of each category, and
a line is drawn through each of these plotted
points.
The frequency polygon can be more useful than the
histogram because several frequency distributions
can be plotted easily on one graph.

Frequency polygon showing cancer mortality by age group
and sex

7. Cumulative frequency graph
A cumulative frequency graph also is a
representation of the distribution of continuous
and ordered data.
In this case, however, the frequency of data in
each category represents the sum of the data
from that category and from the preceding
categories.

The x-axis depicts the categories of data, and the y-
axis is the cumulative frequency of data,
sometimes given as a percentage ranging from 0%
to 100%.
The cumulative frequency graph is useful in
calculating distribution by percentile, including
the median, which is the category of data that
occurs at the cumulative frequency of 50%.

Medical examiner reported (MER) in St. Louis for the years 1979,
1980, & 1981

8. Box plot
 A box plot is a representation of the quartiles
[25%, 50% (median), and 75%] and the range of
a continuous and ordered data set.
 The y-axis can be arthimetic or logarithmic.
 Box plots can be used to compare the different
distributions of data values.

Distribution of weights of patients from hospital A and hospital B

9. Spot map
A spot map, also called a geographic coordinate
chart, is a map of an area with the location of
each case of an illness, disease, abnormality, or
condition identified by a spot or other symbol on
the map.
A spot map often is used in an outbreak setting
and can help an investigator determine the
distribution of cases and characterize an
outbreak if the population at risk is distributed
evenly over the area.

Distribution of Lyme disease cases in Canada from 1977 to 1989

TABLES
In addition to graphs, data are often summarized in
tables. When material is presented in tabular form,
the table should be able to stand alone; that is,
correctly presented material in tabular form should
be understandable even if the written discussion of
the data is not read.

A major concern in the presentation of both figures
and tables is readability.
Tables and figures must be clearly understood and
clearly labeled so that the reader is aided by the
information rather than confused.

Suggestions for the display of data in graphic or
tabular form5
:
1. The contents of a table as a whole and the items
in each separate column should be clearly and
fully defined. The unit of measurement must be
included.
2. If the table includes rates, the basis on which
they are measured must be clearly stated- death
rate percent, per thousand, per million, as the
case may be.

3. Rates or proportions should not be given alone
without any information as to the numbers of
observations on which they are based. By giving
only rates of observations and omitting the
actual number of observations, we are excluding
the basic data.

4. Where percentages are used, it must be clearly
indicated that these are not absolute numbers.
Rather than combine too many figures in one
table, it is often best to divide the material into
two or three small tables.
5. Full particulars of any exclusion of observations
from a collected series must be given. The
reasons for and the criteria of exclusions must be
clearly defined, perhaps in a footnote.

II NUMERICAL SUMMARY OF DATA
Although graphs and frequency distribution tables
can enhance our understanding of the nature of a
variable, rarely do these techniques alone suffice
to describe the variable. A more formal numerical
summary of the variable is usually required for the
full presentation of a data set.

To adequately describe a variable’s values, three
summary measures are needed:
1. The sample size.
2. A measure of central tendency
3. A measure of dispersion.

 The sample size is simply the total number of
observations in the group and is symbolized by the
letter N or n.
 A measure of central tendency or location
describes the middle (or typical) value in a data
set.
 A measure of dispersion or spread quantifies
the degree to which values in a group vary from
one another.

Whenever one wishes to evaluate the outcome of
study, it is crucial that the attributes of the sample
that could have influenced it be described.
Three statistics, the mode, median, and mean,
provide a means of describing the “typical”
individual within a sample.
These statistics are frequently referred to as
“measures of central tendency”.

 Measures of central tendency are characteristics that
describe the middle or most commonly occurring values in
a series.
 They tell us the point about which items have a tendency
to cluster. Such a measure is considered as the most
representative figure for the entire mass of data.
 They are used as summary measures for the series. The
series can consist of a sample of observations or a total
population, and the vales can be grouped or ungrouped.
Measure of central tendency is also known as statistical
average.

1. Mode
The mode of a data set is that value that occurs
with the greatest frequency.
A series may have no mode (i.e., no value occurs
more than once) or it may have several modes
(i.e., several values equally occur at a higher
frequency than the other values in the series).

Whenever there are two nonadjacent scores with
the same frequency and they are the highest in the
distribution, each score may be referred to as the
‘mode’ and the distribution is ‘bimodal’.
In truly bimodal distribution, the population
contains two sub-groups, each of which has a
different distribution that peaks at a different
point.

More than one mode can also be produced
artificially by what is known as digit preference,
when observers tend to favor certain numbers over
others.
For example, persons who measure blood pressure
values tend to favor even numbers, particularly
those ending in 0 (e.g., 120 mm Hg).

Calculation: The mode is calculated by
determining which value or values occur most in a
series.

Example: consider the following data. Patients
who had received routine periodontal scaling were
given a common pain-relieving drug and were
asked to record the minutes to 100% pain relief.
Note that “minutes to pain relief” is a continuous
variable that is measured on the ratio scale. The
patients recorded the following data:

Minutes to 100% pain relief:
15 14 10 18 8 10 12 16 10 8 13
First, make an array, that is, arrange the values in
ascending order:
8 8 10 10 10 12 13 14 15 16 18
By inspection, we already know two descriptive
measures belonging to this data: N=11 and
mode=10.

Application and characteristics
1. The primary value of the mode lies in its ease of
computation and in its convenience as a quick
indicator of a central value in a distribution.
2. The mode is useful in practical epidemiological
work, such as determining the peak of disease
occurrence in the investigation of a disease.

3. The mode is the most difficult measure of central
tendency to manipulate mathematically, that is, it
is not amenable to algebraic treatment; no
analytic concepts are based on the mode.

4. It is also the least reliable because with successive
samplings from the same population the
magnitude of the mode fluctuates significantly
more than the median or mean.
It is possible, for example, that a change in just
one score can substantially change the value of the
modal score.

2. Median P50
The median is the value that divides the
distribution of data points into two equal parts,
that is, the value at which 50% of the data points
lie above it and 50% lie below it.
The median is the middle of the quartiles (the
values that divide the series into quarters) and the
middle of the percentiles (the values that divide
the series into defined percentages).

Calculation:
a) In a series with an odd number of values, the
values in the series are arranged from lowest to
highest, and the value that divides the series in
half is the median.
b) In a series with even number of values, the two
values that divide the series in half are determined,
and the arithmetic mean of these values is the
median.
c) An alternative method for calculating the median
is to determine the 50% value on a cumulative
frequency curve.

Example: In the above example of data series of minutes to
100% pain relief,
8 8 10 10 10 12 13 14 15 16 18
determine which value cuts the array into equal portions. In
this array, there are five data points below 12 and there are
five data points above 12. Thus the median is 12.
8 8 10 10 10 12 13 14 15 16 18
⇑
Median

If the number of observations is even, unlike the
preceding example, simply take the midpoint of
the two values that would straddle the center of
the data set.
Consider the following data set with N=10:
8 8 10 10 10 13 14 15 16 18
⇑
Median = 10+13
 = 11.5
2

Applications and characteristics:
1.The median is not sensitive to one or more extreme
values in a series; therefore, in a series with an
extreme value, the median is a more representative
measure of central tendency than the arithmetic
mean.

2. It is not frequently used in sampling statistics. In
terms of sampling fluctuation, the median is
superior to the mode but less stable than the mean.
For this reason, and because the median does not
possess convenient algebraic properties, it is not
used as often as the mean.
3. Median is a positional average and is used only in
the context of qualitative phenomena, for example,
in estimating intelligence, etc., which are often
encountered in sociological fields.

4. Median is not useful where items need to be
assigned relative importance and weights.
5. The median is used in cumulative frequency
graphs and in survival analysis.

3. Arithmetic Mean
The arithmetic mean, or simply, the mean, is the
sum of all values in a series divided by the actual
number of values in a series.
The symbol for the mean is a capital letter X with a
bar above it:Χ or “X-bar”.

Calculation:
The arithmetic mean is determined as
Χ = ∑ X / N

Example:
Using the minutes to pain relief, N = 11 and ∑
X = 134. Therefore
Χ = 134 / 11 = 12.2 min

Properties of the Mean
1. The mean of a sample is an unbiased estimator
of the mean of the population from which it came.
2. The mean is the mathematical expectation. As
such, it is different from the mode, which is the
value observed most often.

3. The sum of the squared deviations of the
observations from the mean is smaller than the
sum of the squared deviations from any other
number.
4. The sum of the squared deviations from the mean
is fixed for a given set of observations. This
property is not unique to the mean, but it is a
necessary property of any good measure of central
tendency.

1. The arithmetic mean is useful when performing
analytic manipulation. With the exception of a
situation where extreme scores occur in the
distribution, the mean is generally the best
measure of central tendency.
 The values of mean tend to fluctuate least from
sample to sample.

 It is amenable to algebraic treatment and it
possesses known mathematical relationships with
other statistics.
 Hence, it is used in further statistical
calculations. Thus, in most situations the mean is
more likely to be used than either the mode or the
median.

2. The mean can be conceptualized as a fulcrum
such that the distribution of scores around it is in
perfect balance. Since the scores above and below
the mean are in perfect balance, it follows that the
algebraic sum of the observations of these scores
from the mean is 0.

3. Whereas the median counts each score, no matter
what its magnitude, as only one score, the mean
takes into account the absolute magnitude of the
score. The median, therefore, does not balance the
halves of the distribution except when the
distribution is exactly symmetrical; in which case
the mean and the median have identical values.

4. Another way of contrasting the median and the
mean is to compare their values when the
distribution of scores is not symmetrical.

Curve (a) is positively skewed;
that is, the curve tails off to the
right. In this case the mean is
larger than the median because of
the influence of the few very
high scores. Thus these high
scores are sufficient to balance
off the several lower scores. The
median does not balance the
distribution because the
magnitude of the scores is not
included in the computation.
xP50

Curve (b) is negatively
skewed; that is, the
curve tails off to the left.
Now the mean is smaller
than the median because
of the effect of the few
very small scores.
xP50

5. It suffers from some limitations viz., it is unduly
affected by extreme items; it may not coincide
with actual value of an item in a series, and it may
lead to wrong impressions, particularly when the
item values are not given the average.

Let’s refer again to the group of values in which
one patient recorded a rather extreme, for this
group, value:
8 8 10 10 10 12 13 14 15 16 58
The adjusted mean, somewhat larger than the
original mean of 12.2, is calculated as follows:
X = 174 / 11 = 15.8 min

The calculation of the mean is correct, but is its use
appropriate for this data set?
By definition the mean should describe the middle of the
data set.
However, for this data set the mean of 15.8 is larger than
most (9 out of 11!) of the values in the group.
Not exactly a picture of the middle!
In this case the median (12 minutes) is the better choice for
the measure of central tendency and should be used.

However, mean is better than other averages,
especially in economic and social studies where
direct quantitative measurements are possible.

4. Geometric mean
The geometric mean is the nth root of the product
of the values in a series of n values.
Geometric mean (or G.M.) = n π XN
Where,
G.M. = geometric mean,
N = number of items,
π = Conventional product notation
For instance, the geometric mean of the numbers, 4,
6, and 9 is worked out as
G.M.= 3 4.6.9
= 6

Applications and characteristics
1. The geometric mean is more useful and
representative than the arithmetic mean when
describing a series of reciprocal or fractional
values. The most frequently used application of
this average is in the determination of average
percent of change i.e., it is often used in the
preparation of index numbers or when we deal in
ratios.

2. The geometric mean can be used only for
positive values.
3. It is more difficult to calculate than the
arithmetic mean.

5. Harmonic mean
Harmonic mean is defined as the reciprocal of the
average of reciprocals of the values of items of a
series. Symbolically, we can express it as under:
Σ Rec X i
Harmonic mean (H.M.) = Rec. 
N

1. Harmonic mean is of limited application,
particularly in cases where time and rate are
involved.
2. The harmonic mean gives largest weight to the
smallest item and smallest weight to the largest
item.
3. As such it is used in cases like time and motion
study where time is variable and distance constant.

Measures of central tendency provide useful
information about the typical performance for a
group of data. To understand the data more
completely, it is necessary to know how the
members of the data set arrange themselves
about the central or typical value.

The following questions must be answered:
How spread out are the data points?
How stable are the values in the group?

The descriptive tools known as measures of
dispersion answer these questions by quantifying
the variability of the values within a group.
Hence, they are the characteristics that are used to
describe the spread, variation, and scatter of a
series of values.
The series can consist of observations or a total
population, and the values can be grouped or
ungrouped.

This can be done by calculating measures based
on percentiles or measures based on the mean6
.
Measures of dispersion based on percentiles
1. Percentiles
which are sometimes called quantiles, are the
percentage of observations below the point
indicated when all of the observations are ranked
in descending order.

The median, discussed above, is the 50th
percentile.
The 75th
percentile is the point below which 75%
of the observations lie, while the 25th
percentile is
the point below which 25% of the observations lie.

2. Range
The range is the difference between the highest and
lowest values in a series.
Range = Maximum – Minimum.
More usual, however, is the interpretation of the
range as simply the statement of the minimum and
maximum values:
Range = (Minimum, Maximum)

For the sample of minutes to 100% pain relief,
8 8 10 10 10 12 13 14 15 16 58
Range = (8, 18) or Range = 18-8 = 10 min

 The overall range reflects the distance between
the highest and the lowest value in the data set.
In this example it is 10 min.
 In the same example, the 75th
and 25th
percentiles
are 15 and 10 respectively and the distance
between them is 5 min.
This difference is called the interquartile range
(sometimes abbreviated Q3
-Q1
).
Because of central clumping, the interquartile
range is usually considerably smaller than half the
size of the overall range of values.

The advantage of using percentiles is that they can
be applied to any set of continuous data, even if
the data do not form any known distribution.

1. The range is used to measure data spread.
2. The range presents the exact lower and upper
boundaries of a set of data points and thus quickly
lends perspective regarding the variable’s
distribution.

3. The range is usually reported along with the
sample median (not the mean).
4. The range provides no information concerning the
scatter within the series.
5. The range can be deemed unstable because it is
affected by one extremely high score or one
extremely low value. Also, only two values are
considered, and these happen to be the extreme
scores of the distribution. The measure of spread
known as standard deviation addresses this
disadvantage of the range.

Measures of dispersion based on the mean
Mean deviation, variance, and standard deviation
are three measures of dispersion based on the
mean.
Although mean deviation is seldom used, a
discussion of it provides a better understanding of
the concept of dispersion.

1. Mean deviation
Because the mean has several advantages, it might
seem logical to measure dispersion by taking the
“average deviation” from the mean. That proves to
be useless, because the sum of the deviations from
the mean is 0.

However, this inconvenience can easily be solved
by computing the mean deviation, which is the
average of the absolute value of the deviations
from the mean, as shown in the following formula:
Mean deviation = ∑ (X - X)

N

Because the mean deviation does not have
mathematical properties that enable many
statistical tests to be based on it, the formula has
not come into popular use.
Instead, the variance has become the fundamental
measure of dispersion in statistics that are based
on the normal distribution.

2. Variance
The variance is the sum of the squared deviations
from the mean divided by the number of values in
the series minus 1.
Variance is symbolized by s2
or V.
s2=
Σ (X - X)2
/ N-1
Σ (X - X)2
is called sum of squares.

In the above formula, the squaring solves the
problem that the deviations from the mean add up
to 0.
Dividing by N-1 (called degrees of freedom),
instead of dividing by N, is necessary for the
sample variance to be an unbiased estimator of the
population variance.

The numerator of the variance (i.e., the sum of the
squared deviations of the observations from the
mean) is an extremely important entity in
statistics. It is usually called either the sum of
squares (abbreviated SS) or the total sum of
squares (TSS).
The TSS measures the total amount of variation in
a set of observations.

Properties of the variance
1. When the denominator of the equation for
variance is expressed as the number of
observations minus 1 (N-1), the variance of a
random sample is an unbiased estimator of the
variance of the population from which it was
taken.

2. The variance of the sum of two independently
sampled variables is equal to the sum of the
variances.
3. The variance of the difference between two
independently sampled variables is equal to the
sum of their individual variances as well.

1. The principal use of the variance is in calculating
the standard deviation.
2. The variance is mathematically unwieldy, and its
value falls outside the range of observed values in
a data set.
3. The variance is generally of greater importance
to statisticians than to researchers, students, and
clinicians trying to understand the fruits of data
collection.

We should note that the sample variance is a
squared term, not so easy to fathom in relation to
the sample mean.
Thus the square root of the variance, the standard
deviation, is desirable.

3. Standard deviation (s or SD)
The standard deviation is a measure of the
variability among the individual values within a
group.
Loosely defined, it is a description of the average
distance of individual observations from the group
mean.

Conceptualizing the s, or any of the measures of
variance, is more difficult than understanding the
concept of central tendency.
From one point of view, however, the s is similar
to the mean; that is; it represents the mean of the
squared deviations.

Taking the mean and the standard deviation
together, a sample can be described in terms of its
average score and in terms of its average variation.
If more samples were taken from the same
population it would be possible to predict with
some accuracy the average score of these samples
and also the amount of variation.

The mathematical derivation of the standard
deviation is presented here in some detail because
the intermediate steps in its calculation (1) create a
theme (called “sum of squares”) that is repeated
over and over in statistical arithmetic and (2)
create the quantity known as the sample variance.

Calculation:
STEPS MATHEMATICAL
TERM
LABEL
1. Calculate the mean X
of the group
X = Σ X / N Sample mean
2. Subtract the mean
from each value X.
(X - X) Deviation from the
mean
3. Square each
deviation from the
mean.
(X - X)2 Squared deviation
from the mean.
4. Add the squared
deviations from the
mean.
Σ (X - X)2 Sum of squares (ss)

5. Divide the
sum of
squares by
(N-1).
ss / (N -1) Variance (s2
)
6. Find the
square root
of the
variance.
s2
Standard
deviation
(SD or s)
The above table presents the calculation of the standard
deviation for our sample of minutes to 100% pain relief.

We now have two sets of complete sample description for
our example.
Sample Description 1 Sample Description 2
Sample size N = 11 N = 11
Measure of central
tendency
Median = 12 min X = 12.2 minutes
Measure of spread Range = (8, 18) SD = 3.31

The standard deviation is reported along with the
sample mean, usually in the following format:
mean ± SD.
This format serves as a pertinent reminder that the
SD measures the variability of values surrounding
the middle of the data set.

It also leads us to the practical application of the
concepts of mean and standard deviation shown in
the following rules of thumb:
X ± 1 SD encompasses approximately 68% of the
values in a group.
X ± 2 SD encompasses approximately 95% of the
values in a group.
X ± 3 SD encompasses approximately 99% of the
values in a group.

These rules of thumb are useful when deciding
whether to report the mean ± SD or the median
and range as the appropriate descriptive statistics
for a group of data points.
If roughly 95% of the values in a group are
contained in the interval X ± 2SD, researchers
tend to use mean ± SD. Otherwise the median and
the range are perhaps more appropriate.

1. The standard deviation is extremely important in
sampling theory, in co relational analysis, in
estimating reliability of measures, and in
determining relative position of an individual
within a distribution of scores and between
distributions of scores.

2. The standard deviation is the most widely used
estimate of variation because of its known
algebraic properties and its amenability to use
with other statistics.

3. It also provides a better estimate of variation in
the population than the other indexes.
4. The numerical value of standard deviation is
likely to fluctuate less from sample to sample than
the other indexes.

5. In certain circumstances, quantitative probability
statements that characterize a series, a sample of
observations, or a total population can be derived
from the standard deviation of the series, sample,
or population.
6. When the standard deviation of any sample is
small, the sample mean is close to any individual
value.

7. When standard deviation of a random sample is
small, the sample mean is likely to be close to the
mean of all the data in the population.
8. The standard deviation decreases when the
sample size increases.

4. Coefficient of variation
The coefficient of variation is the ratio of the
standard deviation of a series to the arithmetic
mean of the series.
The coefficient of variation is unit less and is
expressed as a percentage.

1. The co efficient of variation is used to compare the
relative variation, or spread, of the distributions of
different series, samples, or populations or of the
distributions of different characteristics of a single series.
2. The coefficient of variation can be used only for
characteristics that are based on a scale with a true zero
value.

Calculation:
The coefficient of variation (CV) is calculated as
CV (%) = SD / X × 100

For example,
In a typical medical school, the mean weight of
100 fourth-year medical students is 140 lb, with a
standard deviation of 28 lb.
CV (%) = 28 / 140 × 100 = 20%
The coefficient of variation for weight is 28 lb
divided by 140 lb, or 20%.

THE NORMAL DISTRIBUTION
The majority of measurements of continuous data
in medicine and biology tend to approximate the
theoretical distribution that is known as the normal
distribution and is also called the gaussian
distribution (named after Johann Karl Gauss, the
person who best described it)6.

• The normal distribution is one of the most
frequently used distributions in biomedical and
dental research.
• The normal distribution is a population frequency
distribution.
• It is characterized by a bell-shaped curve that is
unimodal and is symmetric around the mean of the
distribution.

• The normal curve depends on only two
parameters: the population mean and the
population standard deviation.
• In order to discuss the area under the normal curve
in terms of easily seen percentages of the
population distribution, the normal distribution
has been standardized to the normal distribution in
which the population mean is 0 and the population
standard deviation is 1.

• The area under the normal curve can be
segmented starting with the mean in the center (on
the x axis) and moving by increments of 1 SD
above and below the mean.

Figure shows a standard normal distribution (mean = 0; SD= 1)
and the percentages of area under the curve at each increment of
SD.
34.13% 13.59% 2.27%.2.27%. 13.59% 34.13%

• The total area beneath the normal curve is 1, or
100% of the observations in the population
represented by the curve.
• As indicated in the figure, the portion of the area
under the curve between the mean and 1 SD is
34.13% of the total area.
• The same area is found between the mean and one
unit below the mean.

Moving 2 SD more above the mean cuts off an
additional 13.59% of the area, and moving a total
of 3 SD above the mean cuts off another 2.27%.

The theory of the standard normal distribution
leads us, therefore, to the following property of a
normally distributed variable:
Exactly 68.26% of the observations lie within 1 SD
of the mean.
of the mean.
of the mean.

Virtually all of the observations are contained
within 3 SD of the mean. This is the justification
used by those who label values outside of the
interval X ± 3 SD as “outliers” or unlikely
values.
Incidentally, the number of standard deviations
away from the mean is called Z score.

Problems In Analyzing A Frequency Distribution
In a normal distribution, the following holds true:
mean =median =mode.
In an observed data set, there may be skewness,
kurtosis, and extreme values, in which case the
measures of central tendency may not follow this
pattern.

Skewness and Kurtosis
1.Skewness.
A horizontal stretching of a frequency distribution
to one side or the other, so that one tail of
observations is longer and has more observations
than the other tail, is called skewness.

When a histogram or frequency polygon has a
longer tail on the left side of the diagram, the
distribution is said to be skewed to the left.
If a distribution is skewed, the mean moves farther
in the direction of the long tail than does the
median, because the mean is more heavily
influenced by extreme values.

. A quick way to get an approximate idea of whether
or not a frequency distribution is skewed is to
compare the mean and the median. If these two
measures are close to each other, the distribution
is probably not skewed.

2.Kurtosis.
It is characterized by a vertical
stretching of the frequency
distribution.
It is the measure of the
peakedness of a probability
distribution.
As shown in the figure kurtotic
distribution could look more
peaked or could look more
flattened than the bell shaped
normal distribution.
A normal distribution has zero
kurtosis.

• Significant skewness or kurtosis can be detected
by statistical tests that reveal that the observed
data do not form a normal distribution. Many
statistical tests require that the data they analyze
be normally distributed, and the tests may not be
valid if they are used to compare very abnormal
distributions.
• Kurtosis is seldom discussed as a problem in the
medical literature, although skewness is frequently
observed and is treated as a problem.

3. Extreme values (Outliers)
One of the most perplexing problems for the
analysis of data is how to treat a value that is
abnormally far above or below the mean.
However, before analyzing the data set, the
investigator would want to be sure that this item of
data was legitimate and would check the original
source of data. Although the value is an outlier, it
may probably be correct.

 Tests of statistical significance
 Choosing an appropriate statistical test
 Making inferences from continuous
(parametric) data.
 Making inferences from ordinal data.
 Making inferences from dichotomous and
nominal (nonparametric) data.
 REFERENCES

THE NATURE AND PURPOSE OF
STATISTICAL INFERENCE
As stated earlier, it is often impossible to study
each member of a population. Instead, we select a
sample from the population and from that sample
attempt to generalize to the population as a whole.
The process of generalizing sample results to a
population is termed statistical inference and is
the end product of formal statistical hypothesis
testing.

Inference means the drawing of conclusions from
data.
Statistical inference can be defined as the drawing
of conclusions from quantitative or qualitative
information using the methods of statistics to
describe and arrange the data and to test suitable
hypotheses.

Differences Between Deductive Reasoning And
Inductive Reasoning
Because data do not come with their own
interpretation, the interpretation must be put into
the data by inductive reasoning (from Latin,
meaning “to lead into”). This approach to
reasoning is less familiar to most people than is
deductive reasoning (from Latin, meaning “to
lead out from”), which is learned from
mathematics, particularly from geometry.

Deductive reasoning proceeds from the general
(i.e., from assumptions, from propositions, and
from formulas considered true) to the specific (i.e.,
to specific members belonging to the general
category).
Consider, for example, the following two
propositions: (1) All Americans believe in
democracy. (2) This person is an American. If
both propositions are true, then the following
deduction must be true: This person believes in
democracy.

Deductive reasoning is of special use in science
once hypotheses are formed. Using deductive
reasoning, an investigator says, If this hypothesis
is true, then the following prediction or
predictions also must be true.

If the data are inconsistent with the predictions
from the hypothesis, they force a rejection or
modification of the hypothesis. If the data are
consistent with the hypothesis, they cannot prove
that the hypothesis is true, although they do lend
support to the hypothesis.
To reiterate, even if the data are consistent with
the hypothesis, they do not prove the hypothesis.

Physicians often proceed from formulas accepted
as true and from observed data to determine the
values that variables must have in a certain clinical
situation. For example, if the amount of a
medication that can be safely given per kilogram
of body weight (a constant) is known, then it is
simple to calculate how much of that medication
can be given to a patient weighing 50 kg.
This is deductive reasoning, because it proceeds
from the general (a constant and a formula) to the
specific (the patient).

Inductive reasoning, in contrast, seeks to find valid
generalizations and general principles from data.
Statistics, the quantitative aid to inductive
reasoning, proceeds from the specific (that is, from
data) to the general (that is, to formulas or
conclusions about the data).

For example, by sampling a population and
determining both the age and the blood pressure of
the persons in the sample (the specific data), an
investigator using statistical methods can
determine the general relationship between age
and blood pressure (e.g., that, on the average,
blood pressure increases with age).

Differences Between Mathematics And Statistics
The differences between mathematics and statistics
can be illustrated by showing that they form the
basis for very different approaches to the same
basic equation:
y = mx + b

This equation is the formula for a straight line in
analytic geometry. It is also the formula for simple
regression analysis in statistics, although the
letters used and their order customarily are
different.

In the mathematical formula above, the b is a
constant, and it stands for the y-intercept (i.e., the
value of y when the variable x equals 0). The value
m is also a constant, and it stands for the slope (the
amount of change in y for a unit increase in the
value of x).
The important thing to notice is that in
mathematics, one of the variables (either x or y) is
unknown (i.e., to be calculated), while the formula
and the constants are known.

In statistics, however, just the reverse is true: the
variables, x and y, are known for all observations,
and the investigator usually wishes to determine
whether or not there is a linear (straight line)
relationship between x and y, by estimating the
slope and the intercept. This can be done using the
form of analysis called linear regression, which is
discussed later.

As a general rule, what is known in statistics is
unknown in mathematics, and vice versa. In
statistics, the investigator starts from the specific
observations (data) to induce or estimate the
general relationships between variables.

Probability
The probability of a specified event is the fraction,
or proportion, of all possible events of a specified
type in a sequence of almost unlimited random
trials under similar conditions.
The probability of an event is the likelihood the
event will occur; it can never be greater than 1
(100%) or less than 0 (0%).

1. The probability values in a population are
distributed in a definable manner that can be used
to analyze the population.
2. Probability values that do not follow a
distribution can be analyzed using nonparametric
methods.

Calculation
The probability of an event is determined as
P (A) = A / N
Where P (A) = the probability of event A
occurring; A = the number of times that event A
actually occurs; and N = the total number of
events during which event A can occur.

Example: A medical student performs
venipunctures on 1000 patients and is successful
on 800 in the first attempt. Assuming that all other
factors are equal (i.e., random selection of
patients), the probability that the next
venipuncture will be successful on the first
attempt is 80%.

Rules
a. Additive rule
1. Definition. The additive rule applies when
considering the probability of one of at least two
mutually exclusive events occurring, which is
calculated by adding together the probability value
of each event.

Calculation. The probability of only one of two
mutually exclusive events is determined as
P (A or B) = P (A) + P (B)
Where P (A or B) = the probability of event A or
event B occurring.

1. Example. About 6.3% of all medical students are
black, and 5.5% are Hispanics
The probability that a medical student will ever be
either black or Hispanic is 6.3% plus 5.5%, or
11.8%.

a. Multiplicative rule.
1. Definition. The multiplicative rule applies
when considering the probability of at least two
independent events occurring together, which is
calculated by multiplying the probability values
for the events.

1. Calculation. The probability of two independent
events occurring together is determined as
P (A and B) = P (A) × P (B)
Where P (A and B) = the probability of both event
A and event B occurring.

1. Example. About 6.3% of all medical students are
black and 36.1% of all students are women.
Assuming race and sex are independent selection
factors, the percentage of students who are black
women should be about 6.3% multiplied by
36.1%, or 2.3%.

THE PROCESS OF TESTING
HYPOTHESES

Hypotheses are predictions about what the examination of
appropriate data will show. The following discussion
introduces the basic concepts underlying the usual tests of
statistical significance.
These tests determine the probability that a finding (such
as a difference between means or proportions) represents a
true deviation from what was expected (i.e., from the
model, which is often a null hypothesis that there will be
no difference between the means or proportions).

 False Positive And False Negative Errors
Science is based on the following set of principles
1. Previous experience serves as the basis for
developing hypotheses;
2. hypotheses serve as the basis for developing
predictions;
3. and predictions must be subjected to
experimental or observational testing.

In deciding whether data are consistent or
inconsistent with the hypotheses, investigators are
subject to two types of error.

They could assert that the data support a
hypothesis when in fact the hypothesis is false;
this would be a false-positive error, which is also
called an alpha error or a type I error.
Conversely, they could assert that the data do not
support the hypothesis when in fact the hypothesis
is true; this would be a false-negative error, which
is also called a beta error or a type II error.

Based on the knowledge that the scientists become
attached to their own hypotheses and based on the
conviction that the proof in science, as in the
courts, must be “beyond the reasonable doubt”,
investigators are historically been particularly
careful to avoid the false-positive error.

Probably this is best for theoretical science.
In medicine, however, where a false-negative error
in a diagnostic test may mean missing a disease
until it is too late to institute therapy and where a
false-negative error in the study of a medical
intervention may mean overlooking an effective
treatment, investigators cannot feel comfortable
about false-negative errors either.

 The Null Hypothesis And The Alternative
Hypothesis
The process of significance testing involves three
basic steps:
(1) Asserting the null hypothesis,
(2) Establishing the alpha level, and
(3) Rejecting or failing to reject a null hypothesis

The first step consists of asserting the null
hypothesis, which is the hypothesis that there is no
real (true) difference between means or
proportions of the groups being compared or that
there is no real association between two
continuous variables. It may seem strange to begin
the process by asserting that something is not true,
but it is far easier to reject an assertion than to
prove something is true.

If the data are not consistent with the hypothesis,
the hypothesis can be rejected.
If the data are consistent with a hypothesis, this
still does not prove the hypothesis, because other
hypotheses may fit the data equally well.

The second step is to determine the probability of
being in error if the null hypothesis is rejected.
This step requires that the investigator establish an
alpha level, as described below.

If the p value is found to be greater than the alpha
level, the investigator fails to reject the null
hypothesis. If, however, the p value is found to be
less than or equal to the alpha level, the next step
is to reject the null hypothesis and to accept the
alternative hypothesis, which is the hypothesis
that there is in fact a real difference or association.
Although it may seem awkward, this process is
now standard in medical science and has yielded
considerable scientific benefits.

Statistical tests begin with the statement of the
hypothesis itself, but stated in the form of a null
hypothesis.
For example, consider again the group of patients
who tested the new pain-relieving drug, drug A,
and recorded their number of minutes to 100%
pain relief. Suppose that a similar sample of
patients tested another drug, drug B, in the same
way, and investigators wished to know if one
group of patients experienced total pain relief
more quickly than the other group.

In this case, the null hypothesis would be stated in
this way: “there is no difference in time to 100%
pain relief between the two pain-relieving drugs A
and B”. The null hypothesis is one of no
difference, no effect, no association, and serves as
a reference point for the statistical test.

In symbols, the null hypothesis is referred to as H0
.
In the comparison of the two drugs A and B, we
can state the H0
in terms of there being no
difference in the average number of minutes to
pain relief between drugs A and B, or
H0
: XA
= XB
.
The alternative is that the means of the two drugs
are not equal. This is an expression of the
alternative hypothesis H1
.

Null hypothesis H0
: XA
= XB
Alternative hypothesis H1
: XA
≠XB

 The Alpha Level And P Value
Before doing any calculations to test the null
hypothesis, the investigator must establish a
criterion called the alpha level, which is the
maximum probability of making a false-positive
error that the investigator is willing to accept.

By custom, the level of alpha is usually set at
p = 0.05. This says that the investigator is willing
to run a 5% risk (but no more) of being in error
when asserting that the treatment and control
groups truly differ.
In choosing an alpha level, the investigator inserts
value judgment into the process. However, when
that is done before the data are collected, at least
the post hoc bias of being tempted to adjust the
alpha level to make the data show statistical
significance is avoided.

The p value obtained by a statistical test (such as
the t-test) gives the probability that the observed
difference could have been obtained by chance
alone, given random variation and a single test of
the null hypothesis.
Usually, if the observed p value is ≤ 0.05, members
of the scientific community who read about an
investigation will accept the difference as being
real.

Although setting alpha at ≤ 0.05 is somewhat
arbitrary, that level has become so customary that
it is wise to provide explanations for choosing
another alpha level or for choosing not to perform
tests of significance at all, which may be the best
approach in some descriptive studies.

The p value is the final arithmetic answer that is
calculated by a statistical test of a hypothesis.
Its magnitude informs the researcher as to the
validity of the H0
, that is, whether to accept or
reject the H0
as worth keeping.
The p value is crucial for drawing the proper
conclusions about a set of data.

So what numerical value of p should be used as the
dividing line for acceptance or rejection of the H0
?
Here is the decision rule for the observed value of
p and the decision regarding the H0
.
If p ≤ 0.05, reject the H0
If p > 0.05, accept the H0

If the observed probability is less than or equal to
0.05 (5%), the null hypothesis is rejected, that is,
the observed outcome is judged to be incompatible
with the notion of “no difference” or “no effect”,
and the alternative hypothesis is adopted.
In this case, the results are said to be “statistically
significant”.

If the observed probability is greater than 0.05
(5%), the decision is to accept the null hypothesis,
and the results are called “not statistically
significant” or simply NS, the notation often used
in tables.

Statistical Versus Clinical Significance
The distinction between statistical significance and
clinical or practical significance is worth
mentioning.
For example, in the statistical test of the
H0
: XA
= XB
for two drug groups,
let’s assume that the observed probability is p =
0.01, a value that is less than the dividing line of
0.05 or 5%.

This would lead the investigator to reject the H0
and
to conclude that the results are
“significant at p = 0.01”, that is, one drug caused
total pain relief significantly faster, on average,
than the other drug at p = 0.01.
But if the actual difference in the group means is
itself clinically meaningless or negligible, the
statistical significance may be considered real yet
not useful.

According to Dr. Horowitz,
Statistical significance, “is a mathematical
expression of the degree of confidence that an
observed difference between groups represents a
real difference – that a zero response would not
occur if the study were repeated, and that the
study is not merely due to chance”.

On the other hand, “clinical significance is a
judgment made by the researcher or reader that
differences in response to intervention observed
between groups are important for health”.
“It is a subjective evaluation of the test”, continues
Dr. Horowitz, based on clinical experience and
familiarity with the “disease or condition being
measured”.

 Variation In Individual Observations And In
Multiple Samples
Most tests of significance relate to a difference
between means or proportions.
They help investigators decide whether an
observed difference is real, which in statistical
terms is defined as whether the difference is
greater than would be expected by chance alone.

Inspecting the means to see if they were different
is inadequate because it is not known whether the
observed difference was unusual or whether a
difference that large might have been found
infrequently if the experiment were repeated.

To generalize beyond the particular subjects in the
single study, the investigators must know the
extent to which the difference discovered in the
study are reliable.
The estimate of reliability is given by the standard
error, which is not the same as the standard
deviation.

Standard Deviation And Standard Error
A normal distribution could be completely
described by its mean and standard deviation. This
information is useful in describing individual
observations (raw data),
but it is not useful in determining how close a
sample mean from research data is to the mean
for the underlying population (which is also called
the true mean or the population mean). This
determination must be made on the basis of the
standard error.

The standard error is related to the standard
deviation, but it differs from the standard
deviation in important ways.
Basically, the standard error is the standard
deviation of a population of sample means, rather
than of individual observations.
Therefore the standard error refers to the variability
of individual observations, so that it provides an
idea of how variable a single estimate of the mean
from one set of research data is likely to be.

The frequency distribution of the 100 different
means could be plotted, treating each mean as a
single observation.
These sample means will form a truly normal
(gaussian) frequency distribution, the mean of
which would be very close to the true mean for the
underlying population.

More important for this discussion, the standard
deviation of this distribution of sample means is
an unbiased estimate of the standard deviation of
the underlying population and is called the
standard error of the distribution.

The standard error is a parameter that enables the
investigator to do two things that are central to the
function of statistics.
 One is to estimate the probable amount of error
around a quantitative assertion.
 The other is to perform tests of statistical
significance.

If only the standard deviation and sample size of
one research sample are known, however, the
standard deviation can be converted to a standard
error so that these functions can be pursued.

An unbiased estimate of the standard error can be
obtained from the standard deviation of a single
research sample if the standard deviation was
originally calculated using the degrees of freedom
(N - 1) in the denominator.
The formula for converting a standard deviation
(SD) to a standard error (SE) is as follows:
Standard error = SE = SD

N

The larger the sample size (N), the smaller the
standard error, and the better the estimate of the
population mean.
At any given point on the x-axis, the height of the
bell-shaped curve of the sample means represents
the relative probability that a single sample mean
would fall at that point.
Most of the time, the sample mean would be near
the true mean. Less often, it would be farther
away.

In the medical literature, means or proportions are
often reported either as the mean plus or minus 1
SD or as the mean plus or minus 1 SE.
Reported data must be examined carefully to
determine whether the SD or the SE is shown.
Either is acceptable in theory, because an SD can
be converted to an SE and vice versa if the sample
size is known.
However, many journals have a policy stating
whether the SD or SE must be reported. The
sample size should also be shown.

Confidence Intervals
Whereas the SD shows the variability of individual
observations, the SE shows the variability of
means.
Whereas the mean plus or minus 1.96 SD estimates
the range in which 95% of individual observations
would be expected to fall, the mean plus or minus
1.96 SE estimates the range in which 95% of the
means of repeated samples of the same size
would be expected to fall.

Moreover, if the value for the mean plus or minus
1.96 SE is known, it can be used to calculate the
95% confidence interval, which is the range of
values in which the investigator can be 95%
confident that the true mean of the underlying
population falls.

Tests Of Statistical Significance

The science of biostatistics has given us a large
number of tests that can be applied to public
health data. An understanding of the tests will
guide an individual toward the efficient collection
of data that will meet the assumptions of the
statistical procedures particularly well.

The tests allow investigators to compare two
parameters, such as means or proportions, and to
determine whether the difference between them is
statistically significant.

The various t- tests (the one tailed Student’s t- test,
the two-tailed Student’s t –test, and the paired t-
test) compare differences between means, while
z- tests compare differences between proportions.
All of these tests make comparisons possible by
calculating the appropriate form of a ratio, which
is called a critical ratio because it permits the
investigator to make a decision.

This is done by comparing the ratio obtained from
whatever test is performed (e.g., a t- test) with the
values in the appropriate statistical table (e.g., a
table of t values) for the observed number of
degrees of freedom.
Before individual tests are discussed in detail, the
concepts of critical ratios and degrees of freedom
are defined.

Critical Ratios
Critical ratios are a class of tests of statistical
significance that depend on dividing some
parameter (such as a difference between means)
by the standard error (SE) of that parameter.

The general formula for tests of statistical tests is as
follows:
Critical Ratio = Parameter

SE of that parameter

When applied to the student’s t- test, the formula
becomes:
Difference between two means
Critical Ratio = t = 
SE of the difference between two means

When applied to a z- test, the formula becomes:
Difference between two proportions
Critical Ratio = z = 
SE of the difference between two proportions

The value of the critical ratio (e.g., t or z) is then
looked up in the appropriate table (of t or z) to
determine the corresponding value of p.
For any critical ratio, the larger the ratio, the more
likely that the difference between means or
proportions is due to more than just random
variation (i.e., the more likely it is that the
difference can be considered statistically
significant and, hence, real).

Unless the total sample size is small (say, under
30), the finding of a critical ratio of greater than
about 2 usually indicates that the difference is real
and enables the investigator to reject the null
hypothesis.
The statistical tables adjust the critical ratios for
the sample size by means of the degrees of
freedom.

Degrees of Freedom
The term “degrees of freedom” refers to the
number of observations that are free to vary.

The Idea Behind The Degrees Of Freedom
The term “degrees of freedom” refers to the
number of observations (N) that are free to vary.
The degree of freedom is lost every time a mean is
calculated.
Why should this be?

Before putting on a pair of gloves, a person has the
freedom to decide whether to begin with left or the
right glove. However, once the person puts on the
first glove, he or she loses the freedom to decide
which glove to put on last.
If centipedes put on shoes, they would have a
choice to make for the first 99 shoes but not for
the 100th
shoe. Right at the end, the freedom to
choose (vary) is restricted.

In statistics, if there are two observed values, only
one estimate of the variation between them is
possible.
Something has to serve as the basis against which
other observations are compared.
The mean is the most “solid” estimate of the
expected value of a variable, so it is assumed to be
“fixed”.

This implies that the numerator of the mean (the
sum of individual observations, or the sum of xi
),
which is based on N observations, is also fixed.
Once N – 1 observations (each of which was,
presumably, free to vary) have been added up, the
last observation is not free to vary, because the
total values of the N observations must add up to
the sum of xi
.

For this reason, 1 degree of freedom is lost each
time a mean is calculated. The proper average of a
sum of squares when calculated from an observed
sample, therefore, is the sum of squares divided by
the degrees of freedom (N - 1).

Hence, for simplicity, the degrees of freedom for
any test are considered to be the total sample size
minus 1 degree of freedom for each mean that is
calculated. In Student’s t- test 2 degrees of
freedom are lost because two means are calculated
(one mean for each group whose means are to be
compared).

The general formula for degrees of freedom for the
Student’s two-group t- test is N1
+ N2
– 2,
where N1
is the sample size in the first group and
N2
is the sample size in the second group.

Use of t- test
In medical research, t- tests are among the three or
four most commonly used statistical tests
(Emerson and Colditz 1983)6.
The purpose of t- test is to compare the means of a
continuous variable in two research samples in
order to determine whether or not the difference
between the two observed means exceeds the
difference that would be expected by chance from
random sample.

Sample population and Sizes
If two different samples come from two different
groups (e.g., a group of men and a group of
women), the Student’s t- test is used.
If the two samples come from the same group
(e.g., pretreatment and post- treatment values for
the same study subjects), the paired t- test is used.

Both types of t-tests depend on certain
assumptions, including the assumption that the
data in the continuous variable are normally
distributed (i.e., have a bell-shaped distribution).
Very seldom, however, will observed data be
perfectly normally distributed. Does this invalidate
the t-test? Fortunately, it does not. There is a
convenient theorem, that rescues the t-test (and
much of statistics as well).

The central limit theorem can be derived
theoretically or observed by experimentation.
According to the theorem, for reasonably large
samples (say, 30 or more observations in each
sample), the distribution of the means of many
samples is normal (gaussian), even though the data
in individual samples may have skewness,
kurtosis, or unevenness.

Because the critical theoretical requirement for the
t-test is that the sample means be normally
distributed, a t-test may be compared on almost
any set of continuous data, if the observations can
be considered a random sample and the sample
size is reasonable large.

The t-distribution
The t distribution was described by William Gosset,
who used the pseudonym “Student” when he
wrote the description.

The t distribution looks similar to normal
distribution, except that its tails are somewhat
wider and its peak is slightly less high, depending
on the sample size.
The t distribution is necessary because when
sample sizes are small, the observed estimates of
the mean and variance are subject to considerable
error.

The larger the sample size is, the smaller the errors
are, and the more the t distribution looks like the
normal distribution. In the case of an infinite
sample size, the two distributions are identical.
For practical purposes, when the combined sample
size of the two groups being compared is larger
than 120, the difference between the normal
distribution and the t distribution is negligible.

Student’s t test
There are two types of Student’s t test:
the one-tailed and
the two-tailed type.
The calculations are the same, but the
interpretation of the resulting t differs somewhat.
The common features will be discussed before the
differences are outlined.

Calculation of the value of t.
In both types of Student’s t test, t is calculated by
taking the observed differences between the means
of the two groups (the numerator) and dividing
this difference by the standard error of the
difference between the means of the two groups
(the denominator).

Before t can be calculated, then, the standard error
of the difference between the means (SED) must
be determined.
The basic formula for this is the square root of the
sum of the respective population variances, each
divided by its own sample size.

When the Student’s t-test is used to test the null hypothesis
in research involving an experimrntal group and a control
group, it usually takes the general form of the following
equation:
t = xE
- xC
– 0
s2
p
[(1 / NE
) + (1 / NC
)]
df = NE
+ NC
– 2

The 0 in the numerator of the equation for t was
added for correctness, because the t-test
determines if the difference between the means is
significantly different from 0.
However, because the 0 does not affect the
calculations in any way, it is usually omitted from
t-test formulas.

The same formula, recast in terms to apply to any
two independent samples (e.g., samples of men
and women), is as follows,
t = x1
- x2
- 0
s2
p
[(1 / N1
) + (1 / N2
)]
df = N1
+ N2
– 2

in which x1
is the mean of the first sample, x2
is
the mean of the second sample, s2
p
is the pooled
estimate of the variance, N1
is the size of the first
sample, N2
is the size of the second sample, and df
is the degrees of freedom.

The 0 in the numerator indicates that the null
hypothesis states that the difference between the
means will not be significantly different from 0.
The df is needed to enable the investigator to refer
to the correct line in the table of the values of t and
their relationship to p.

The t test is designed to help investigators
distinguish “explained variation” from
“unexplained variation” (random error, or
chance).
These concepts are like “signal” and “background
noise” in radio broadcast engineering. Listeners
who are searching for a particular station on their
radio dial will find background nose on almost
every radio frequency.

When they reach the station that they want to hear,
they may not notice the background noise, since
the signal is so much stronger than this noise.
In medical studies, the particular factor that is
being investigated is similar to the radio signal,
and random error is similar to background noise.

Statistical analysis helps distinguish one from the
other by comparing their strengths.
If the variation caused by the factor of interest is
considerably larger than the variation caused by
random factors (i.e., if in the t-test the ratio is
approximately 1.96), the effect of the factor of
interest becomes detectable above the statistical
“noise” of random factors.

Interpretation of the results
If the value of t is large, the p value will be small,
because it is unlikely that a large t ratio will be
obtained by chance alone. If the p value is 0.05 or
less, it is customary to assume that there is a real
difference. Conceptually, the p value is the
probability of being in error if the null hypothesis
of no difference between the means is rejected and
the alternative hypothesis of a true difference is
accepted.

• One-Tailed and Two-Tailed t-Tests
• These tests are sometimes called the one-
sided test and the two-sided tests.

• In the two-tailed test,
alpha is equally
divided at the ends of
the two tails of the
distribution. The two-
tailed test is generally
recommended,
because differences
in either direction are
usually important to
document.

For example, it is obviously important to know if a
new treatment is significantly better than a
standard or placebo treatment, but it is also
important to know if a new treatment is
significantly worse and should therefore be
avoided.
In this situation, the two-tailed test provides an
accepted criterion for when a difference shows the
new treatment to be either better or worse.

Sometimes, however, only a one-tailed test is
needed.
Suppose, for example, that a new therapy is known
to cost much more than the currently used therapy.
Obviously, it would not be used if it were worse
than the current therapy, but it would also not be
used if it were merely as good as the current
therapy.

Under these circumstances, some
investigators consider it
acceptable to use a one-tailed test.
When this occurs, the 5% rejection
region for the null hypothesis is
all put on one tail of the
distribution, instead of being
evenly divided between the
extremes of the two tails.

In the one-tailed test, the null hypothesis
nonrejection region extends only to 1.645 standard
errors above the “no difference” point of 0.
In the two-tailed test, it extends to 1.96 standard
errors above and below the “no difference” point.

This makes the one-tailed test more robust-that is,
more able to detect a significant difference, if it is
in the expected direction. Many investigators
dislike one-tailed tests, because they believe that if
an intervention is significantly worse than the
standard therapy, that should be documented
scientifically. Most reviewers and editors require
that the use of a one-tailed significance test be
justified.

Paired t- test
In many medical studies, individuals are followed
over time to see if there is a change in the value of
some continuous variable. Typically, this occurs in
a “better and after” experiment, such as one testing
to see if there was a drop in average blood pressure
following treatment or to see if there was a drop in
weight following the use of a special diet. In this
type of comparison, an individual patient serves as
his or her own control.

The appropriate statistical test for this kind of data
is the paired t-test. The paired t-test is more robust
than the Student’s t-test because it considers the
variation from only one group of people, whereas
the Student’s t-test considers variation from two
groups.
Any variation that is detected in the paired t-test is
attributable to the intervention or to changes over
time in the same person.

Calculation of the value of t
To calculate a paired t-test, a new variable is
created. This variable, called d, is the difference
between the values before and after the
intervention for each individual studied.

The paired t-test is a test of the null hypothesis
that, on the average, the difference is equal to 0,
which is what would be expected if there were no
change over time.
Using the symbol d to indicate the mean
observed difference between the before and after
values, the formula for the paired t-test is as
follows:

tpaired
= tp
= d – 0
Standard error of d
= d – 0
sd
2
N

df = N – 1
But in the paired t-test, because only one mean is
calculated (d) , only one degree of freedom is
lost; therefore, the formula for the degrees of
freedom is N – 1.

Interpretation of the results
If the value of t is large, the p value will be small,
because it is unlikely that a large t ratio will be
obtained by chance alone. If the p value is 0.05 or
less, it is customary to assume that there is a real
difference (i.e., that the null hypothesis of no
difference can be rejected).

Use of z-tests
In contrast to t-tests, which compare differences
between means, z-tests compare differences
between proportions.
In medicine, examples of proportions that are
frequently studied are sensitivity, specificity,
positive predictive value, risks, percentages of
people with a given symptom, percentages of
people who are ill, and percentages of ill people
who survive their illness

Frequently, the goal of research is to see if the
proportion of patients surviving in a treated group
differs from that in an untreated group. This can
be evaluated using a z-test for proportions.

Calculation of the value of z
As discussed earlier, z is calculated by taking the
observed difference between the two proportions
(the numerator) and dividing it by the standard
error of the difference between the two
proportions (the denominator).

For purposes of illustration, assume that research is
being conducted to see if the proportion of patients
surviving in a treated group is greater than that in
an untreated group.
For each group, if p is the proportion of successes
(survivals), then 1 – p is the proportion of failures
(nonsurvivals).

If N represents the size of the group on which the
proportion is based, the parameters of the
proportion could as follows:
Variance (proportion) = p (1 - p)
N

Standard error (proportion) = SEp
= p (1 - p)
N
95% confidence interval = 95% CI = p ± 1.96 SEp

if there is a 0.60 (60%) survival rate following a given
treatment, the calculations of SEp
and the 95% CI of the
proportion, based on a sample of 100 study subjects, would
be as follows:
SEp
= (0.6) (0.4) / 100
= 0.24 / 100
= 0.49 / 10
= 0.049

95% CI = 0.6 ± (1.96) (0.049)
= 0.6 ± 0.096
= between 0.6 – 0.096 and 0.6 +
0.096
= 0.504, 0.696

Now that there is a way to obtain the standard error
of a proportion, the standard error of the
difference between proportions also can be
obtained, and the equation for the z-test can be
expressed as follows:
z = p1
– p2
-0
p (1 - p) [(1/ N1
) + (1/ N2
)]

in which p1
is the proportion of the first sample, p2
is the proportion of the second sample, N1
is the
size of the first sample, N2
is the size of the second
sample, and p is the mean proportion of
successes in all observations combined. The 0 in
the numerator indicates that the null hypothesis
states that the difference between the proportions
will not be significantly different from 0.

Interpretation of results
Note that the above formula for z is similar to the
formula for t in the Student’s t-test, as described
earlier. However, because the variance and the
standard error of the proportion are based on a
theoretical distribution (the binominal
approximation to the z distribution), the z
distribution is used instead of the t distribution in
determining whether the difference is statistically
significant. When the z ratio is large (as when the t
ratio is large), the difference is more likely to be
real.

The computations for the z tests appear different
from the computations for the chi-square test, but
when the same data are set up as a 2 × 2 table,
technically the computations for the two tests are
identical. Most people find it easier to do a chi-
square test than do a z-test for proportions.

Choosing An Appropriate
Statistical Test

A variety of statistical tests can be used to analyze
the relationship between two or more variables.
The bivariate analysis is the analysis of the
relationship between one independent (possibly
causal) variable and one dependent (outcome)
variable. Whereas, the multivariable analysis is
the analysis of the relationship of more than one
independent variable to a single dependable
variable.

Statistical tests should be chosen only after the
types of clinical data to be analyzed and the basic
research design have been established. In general,
the analytic approach should begin with a study of
the individual variables, including their
distributions and outliers, and with a search for
errors. Then bivariate analysis can be done to test
hypotheses and probe for relationships. Only after
these procedures have been done carefully should
multivariable analysis be attempted.

Among the factors involved in choosing an
appropriate statistical test are the goals and
research design of the study and the type of data
being gathered.
In some studies the investigators are interested in
descriptive information, such as the sensitivity or
specificity of a laboratory assay, in which case
there may be no reason to perform a test of
statistical significance.

In other studies, the investigators are interested in
determining whether the difference between two
means is real, in which case testing for statistical
significance is appropriate.

The types of variables and the research designs set
the limits to statistical analysis and determine
which tests are appropriate. An investigator’s
knowledge of the types of variables (continuous
data, ordinal data, dichotomous data and nominal
data) and appropriate statistical tests is analogous
to a painter’s knowledge of the types of media
(oils, tempera, water colors, and so forth) and the
appropriate brushes and techniques to be used.

If the research design involves before and after
comparisons in the same study subjects or
involves comparisons of matched pairs of study
subjects, a paired test of statistical significance-
such as the paired t-test, the Wilcoxon matched
pairs signed-ranks test, or the McNemar chi-
square test- would be appropriate. Moreover, if the
sampling procedure in a study is not random,
statistical tests that assume random sampling, such
as most of the parametric tests, may not be valid.

Making inferences from continuous
(parametric) data

If the study involves two continuous variables, the
following questions may be answered:
(1) is there a real relationship between the
variables or not?
(2) If there is real relationship, is it a positive or
negative linear relationship (a straight-line
relationship), or is it more complex?
(3) If there is a real relationship, how strong is it?
(4) How likely is the relationship to be
generalizable?

The best way to answer these questions is first to
plot the continuous data on a joint distribution
graph and then to perform correlation analysis and
simple linear regression analysis.

The Joint Distribution Graph
Taking the example of a sample of elderly
xerostomia patients, does the number of root
caries increase with increasing amounts of sugar
in the diet (number of servings per day)? In this
instance, data are recorded on a single group of
subjects, and each subject constitutes a pair of
measures (number of servings per day of sugar
and number of root caries). Commonly, any pair
of variables entered into a correlation analysis is
given the names x and y.

This data can be plotted on a joint distribution
graph, as shown in fig. The data do not form a
perfectly straight line, but they do appear to lie
along a straight line, going from the lower left to
the upper right on the graph, and all of these
observations but one are fairly close to the line.

As indicated in fig, the correlation between two
variables, labeled x and y, can range from
nonexistent to strong. If the value of y increases as
x increases, the correlation is positive; if y
increases as x increases, the correlation is
negative.

It appears from the
graph that the
correlation between
amounts of sugar and
dental caries is strong
and is positive.
Y
X

Therefore, based on fig, the answer to the first
question above is that there is a real relationship
between amount of sugar and dental caries. The
graph, however, does not reveal the probability
that such a relationship could have occurred by
chance. The answer to the second question is that
the relationship is positive and is linear. The graph
does not provide quantitative information about
how strong the association is (although it looks
strong to the eye).

To answer these questions more precisely, it s
necessary to use the techniques of correlation and
simple linear regression. Neither the graph nor
these techniques, however, can answer the
question of how generalizable the findings are.

Biostatics ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Biostatics ppt

Similar to Biostatics ppt (20)

More from pratiklovehoney

More from pratiklovehoney (10)

Recently uploaded

Recently uploaded (20)

Biostatics ppt