Statistical analysis

STATISTICAL ANALYSIS
Princy Francis M
Ist Yr MSc(N)
JMCON

DEFINITION
• Statistical analysis is the organisation and analysis of
quantitative or qualitative data using statistical procedures,
including both descriptive and inferential statistics.
• It’s the science of collecting, exploring and presenting
large amounts of data to discover underlying patterns and
trends.

DEFINITION
• Statistics is a branch of science that deals with the
collection, organisation, analysis of data and drawing of
inferences from the samples to the whole population.
• Sample is a small portion of population which truly
represents the population with respect to the study
characteristic of the population.

PURPOSES
• Summarize
• Explore the meaning of deviations in data
• Compare or contrast descriptively
• Test the proposed relationships in a theoretical model
• Infer that the findings from sample are indicative
• Examine causality.
• Predict or infer from the sample to a theoretical model.

ELEMENTS OF STATISTICAL ANALYSIS
Understand the complex relationship among the
correlates of the disease under study.
The analysis should start with simple comparison of
proportions and means.
Interpretation of result should be guided by clinical and
biological consideration.

STATISTICAL MEASURES
 Mean
 Mode
 Median
 Interquartile Range
 Standard Deviation.

MEAN
• The mean is the average of all numbers

Example
• Mean of 10, 20, 30, 40
25

MEDIAN
• When all the observations are arranged in ascending or descending
orders of magnitude, the middle one is the median.
• For raw data, If n is the total number of observations, the value of the
[
𝑛+1
2
] th item will be called median .
• if n is the even number, the mean of n/2th item and [
𝑛
2
+ 1] th item
will be median.
Example : Median of given data 10, 20, 30 is
20

MODE
• The Mode is the value of a series which appears most frequently than
any other .
• For grouped data,
Mode, M0 = L0 +{
𝛥1
𝛥1+𝛥2
} x c
Where, L0 is lower limit of modal class,
C is class interval
𝛥1 is difference between modal frequency and its preceding class
∆2 is difference between modal frequency and following class
frequency.
Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is
80

INTERQUARTILE RANGE
• The interquartile range (IQR), is a measure of statistical dispersion,
being equal to the difference between 75th and 25th percentiles, or
between upper and lower quartiles.
• IQR = Q3 − Q1.

Example
Interquartile range of following data 30, 20, 40, 60 , 50
• Q1 =[
𝑛+1
4
]th item = 1.5th item = 20+ 0.5 (30-20) = 25
• Q3 = 3[
𝑛+1
4
]th item = 50 +0.5x (60-50) = 55.
• IQR = 30

STANDARD DEVIATION
• The standard deviation is the most useful and most popular measure
of dispersion.
• The standard deviation is defined as the positive square root of the
arithmetic mean of the square of the deviations of given observations
from their arithmetic mean.
• The standard deviation is denoted by ‘𝜎 ’.

EXAMPLE
• Standard deviation of data 10, 20, 30, 40, 50 where n= 5 , 𝑥 = 30
• 𝜎 = √1000/4 = √250 = 15. 811

STANDARD NORMAL DISTRIBUTION CURVE AND MEAN, MEDIAN,
INTERQUARTILE RANGE AND STANDARD DEVIATION

TYPES
• PARAMETRIC STATISTICAL ANALYSIS
• NONPARAMETRIC STATISTICAL ANALYSIS

PARAMETRIC STATISTICAL ANALYSIS
• Most commonly used type of statistical analysis.
• This analysis is referred to as parametric statistical analysis because
the findings are inferred to the parameters of a normally distributed
populations.
• Numerical data (quantitative variables) that are normally distributed
are analysed with parametric tests.

ASSUMPTIONS
• The assumption of normality which specifies that the means of the
sample group are normally distributed
• The assumption of equal variance which specifies that the variances
of the samples and of their corresponding population are equal.
• The data can be treated as random samples

NONPARAMETRIC STATISTICAL ANALYSIS
• Nonparametric statistical analysis or distribution free techniques
• It can be used in studies that do not meet the first two assumptions.
• Most nonparametric techniques are not as powerful as their
parametric counter parts.

• If the distribution of the sample is skewed towards one side or the
distribution is unknown due to the small sample size, non-parametric
statistical techniques are used.
• Non-parametric tests are used to analyse ordinal and categorical data.

EXPLORATORY DATA ANALYSIS AND
CONFIRMATORY DATA ANALYSIS
• John Tukey
• Exploratory data analysis to obtain a preliminary indication of the
nature of the data and to search data for hidden structure or models.
• Confirmatory data analysis involves traditional inferential statistics ,
which you can use to make an inference about a population or a
process based on evidence from the study sample.

STATISTICAL ANALYSIS DECISION MAKING
Two group
comparison
Mean
Parametric Independent 2 sample t test
Nonparametric Mann Witney U test
Percentage Chi-Square Test
One group
comparison
Mean
Single mean One sample t test
Mean
difference
Parametric Paired t test
Non parametric Wilcoxan Signed Scale test
More than 2
group
comparison
Mean
Parametric ANOVA
Non parametric Kruskal Walli’s test
Percentage Chi square test

PARAMETRIC
STATISTICAL
ANALYSIS
Student's t-test
 Z test
Analysis of variance (ANOVA)

Student's t-test
• Developed by Prof.W.S.Gossett
• Student's t-test is used to test the null hypothesis that there is no
difference between the means of the two groups
• One-sample t-test
• Independent Two Sample T Test (the unpaired t-test)
• The paired t-test

One-sample t-test
• To test if a sample mean (as an estimate of a population mean) differs
significantly from a given population mean.
• The mean of one sample is compared with population mean
where 𝑥 = sample mean, u = population mean and S = standard
deviation, n = sample size

Example
A random sample of size 20 from a normal population gives a sample
mean of 40, standard deviation of 6. Test the hypothesis is population
mean is 44. Check whether there is any difference between mean.
• H0: There is no significant difference between sample mean and
population mean
• H1: There is no significant difference between sample mean and
population mean
mean = 40 , 𝜇 = 44, n = 20 and S = 6

• tcalculated = 2.981
• t table value = 2.093
• tcalculated > t table value ;
Reject H0.

Independent Two Sample T Test (the
unpaired t-test)
• To test if the population means estimated by two independent
samples differ significantly.
• Two different samples with same mean at initial point and compare
mean at the end

t =
𝑥1− 𝑥2
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
1
𝑛1
+
1
𝑛2
Where 𝑥1 - 𝑥2 is the difference between the means of the two groups
and S denotes the standard deviation.

Example
Mean Hb level of 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10,
17.5, 14.2,15 and 14.1 . Test whether there is any significant difference
between Hb values.
• H0: There is no significant difference between Hb Level
• H1: There is no significant difference between Hb level.
t =
𝑥1− 𝑥2
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
1
𝑛1
+
1
𝑛2

• 𝑥1 = 11.2 , 𝑥2 =14.16 , 𝑆1
2
= 1.075, 𝑆2
2
= 7.293
• tcalculated = 2.287, t table = 2.306, tcalculated > t table value ; reject H0.
X1 X2 X1 - 𝑥1 X2 - 𝑥2 (X1 - 𝑥1)2 (X2 - 𝑥2)2
10
11
12.5
10.5
12
10
17.5
14.2
15
14.1
-1.2
- 0.2
1.3
-0.7
0.8
-4.16
3.34
0.04
0.84
-0.06
1.44
0.04
1.69
0.49
0.64
17.305
11.156
0.0016
0.706
0.0036
Σ = 56 70.8 4.3 29.172

The paired t-test
• To test if the population means estimated by two dependent samples differ
significantly .
• A usual setting for paired t-test is when measurements are made on the
same subjects before and after a treatment.
where 𝑑 is the mean difference and Sd denotes the standard deviation of the
difference.

Example
Systolic BP of 5 patients before and after a drug therapy is
Before 160, 150, 170, 130, 140
After 140, 110, 120, 140, 130
Test whether there is any significant difference between BP level.
• H0: There is no significant difference between BP Level before and after
drug
• H1: There is no significant difference between BP level before and after
drug

• 𝑑 = 22, Sd = 23.875
• tcalculated = 2.060, t table = 2.567, tcalculated < t table value ; Accept H0.
Before After d d- 𝑑 (d- 𝑑 )2
160
150
170
130
140
140
110
120
140
130
20
40
50
-10
10
-2
18
28
-32
-12
4
324
784
1024
144
𝛴𝑑 = 110 2280

Z test
Generally, z-tests are used when we have large sample sizes (n > 30),
whereas t-tests are most helpful with a smaller sample size (n < 30).
Both methods assume a normal distribution of the data, but the z-tests
are most useful when the standard deviation is known.
z = (x – μ) / (σ / √n)

ANALYSIS OF VARIANCE (ANOVA)
• R. A. Fischer.
• The Student's t-test cannot be used for comparison of three or more groups.
• The purpose of ANOVA is to test if there is any significant difference between the
means of two or more groups.
• The analysis of variance is the systematic algebraic procedure of decomposing the
overall variation in the responses observed in an experiment into variation.
• Two variances – (a) between-group variability and (b) within-group variability that
is variation existing between the samples and variations existing within the
sample.
• The within-group variability (error variance) is the variation that cannot be
accounted for in the study design.
• The between-group (or effect variance) is the result of treatment

• A simplified formula for the F statistic is
where MST is the mean squares between the groups and MSE is the
mean squares within groups

NONPARAMETRIC
STATISTICAL
ANALYSIS
 CHI-SQUARE TEST
 THE WILCOXON'S SIGNED RANK TEST
 MANN-WHITNEY U TEST
 KRUSKAL-WALLIS TEST

CHI-SQUARE TEST
• Tests to analyse the categorical data
• The chi-square test is a widely used test in statistical decision making.
• The test is first used by Karl pearson in 1900.
• The Chi-square test compares the frequencies and tests whether the
observed data differ significantly from that of the expected data.

CHI-SQUARE TEST
It is calculated by the sum of the squared difference between observed
(O) and the expected (E) data (or the deviation, d) divided by the
expected data by the following formula:

Example
• Attack rates among vaccinated and not vaccinated against measles
are given in the following table. Test the association between
association between vaccination and attack of measles
Groups Attacked Not attacked
Vaccinated
Not vaccinated
10
26
90
74

• H0: There is no significant association between vaccination and attack
of measles
• H1: There is significant association between vaccination and attack of
measles

• Chi square table value = 3.841 , chi square calculated value = 8.672
• 𝑥2
calculated > 𝑥2
table value ; Reject H0.
Oi Ei Oi - Ei (Oi - Ei )2 (Oi - Ei )2 /
Ei
10
90
26
74
18
82
18
82
-8
8
8
-8
64
64
64
64
3.556
0.780
3.556
0.780
𝛴 = 8.672

THE WILCOXON'S SIGNED RANK TEST
• Wilcoxon's rank sum test ranks all data points in order, calculates the
rank sum of each sample and compares the difference in the rank
sums.
• For testing whether the differences observed in the values of the
quantitative variable between two correlated samples (before and
after design ) are statistically different or not
• This test corresponds to the paired t test.

Method
• H0: There is no difference in the paired values, on an average, between the two
groups.
• H1: There is difference in the paired values, on an average, between the two
groups.
• Compute the difference between each group of paired values in the two group.
• Rank the difference from smallest, without considering the sign of difference.
• After giving ranks, the corresponding sign should be attached.
• T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T
is taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ .
• Find the W critical value from Wilcoxon’s Signed rank Table .
• if Wstat < WCritical Value; Reject H0.

EXAMPLE
• IQ values of 8 malnourished children of 4 years age before and after
giving some nutritious diet for 3 months are given below
Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85

• H0: There is no difference in the paired values
• H1: There is difference in the paired values
Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
Differe
nce
-10 -20 5 -5 3 10 -10 -15
Absolu
te
differe
nce
10 20 5 5 3 10 10 15
Rank 5 8 2.5 2.5 1 5 5 7

• T+ = 8.5, T- = 27.5. T = 8.5
• Wstat = 8.5, Wcritic = 3
• Wstat > WCritical Value; Accept H0.

• If Assuming normal distribution for the differences, test statistic is,
Z = {|T-m| -0.5} / SD
Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and
SD = √{
𝑛 𝑛+1 2𝑛+1
24
}
• If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected

MANN-WHITNEY U TEST
• For testing whether two independent samples with respect to a
quantitative variable come from the same population or not.
• Wilcoxon’s Rank Sum test.
• It is used to test the null hypothesis that two samples have the same
median or, alternatively, whether observations in one sample tend to
be larger than observations in the other.
• This test is alternative of t test for two independent samples

METHOD
H0: The average values in the two groups are the same
H1: The average values in the two groups are the different
• Let n1 is the sample size of one group and n2 is the sample size of
second group, Rank all the values in the two groups take together.
Tied values should be given same ranks.
• The ranksum of each group is taken and Ustat is calculated using
Ustat = Rank Sum - {n(n +1)/2 }.
• Both U1 and U2 is calculated and smaller value is taken as Ustat. and
Ucritical value is calculated from the Mann- Whitney U test table
• if Ustat < UCritical value; ; Reject H0.

Example
Treatment A Treatment B
3
4
2
6
2
5
9
7
5
10
6
8

• H0: The average values in the 2 treatment are the same
• H1: The average values in the 2 treatment are the different
Ustat = Rank Sum - {n(n +1)/2 }.
Ranks 1 2 3 4 5 6 7 8 9 10 11 12
Values 2 2 3 4 5 5 6 6 7 8 9 10
Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12

• UA = 23 – 21 = 2, UB = 55- 21 =34 so Ustat = 2 (lowest value)
• Ucritic = 5
• Ustat < UCritical value; Reject H0.

• Assuming that the ranks are randomly distributed in the two groups,
the test statisticis
Z = {|m-T| -0.5} / SD
Where T = smaller of T1 and T2.
T1 = sum of the ranks of smaller group, T2 = {(n1 +n2)(n1 +n2 +1) / 2} – T1 ,
m= mean sum of ranks { n1 ( n1 +n2+1)}/2
SD = √{
n1 x
n2
)(
n1
+
n2
+
1
12
}
• If Z is less than 1.96, H0 is accepted
• if Z>1.96 , H0 is rejected at 5% level of significance

KRUSKAL-WALLIS TEST
• The Kruskal–Wallis test is a non-parametric test to analyse the
variance.
• It is for the comparison among several independent samples.
• For testing whether several independent samples of a quantitative
variable come from the same population or not
• It corresponds to one way analysis of variance in parametric methods.

• It analyses if there is any difference in the median values of three or more
independent samples.
• The data values are ranked in an increasing order, and the rank sums
calculated followed by calculation of the test
Where n is the total of sample sizes in all the groups and Ri is the sum of the
ranks in the ith group.

Method
H0: The average values in the different groups are the same
H1: The average values in the different groups are the different
• Rank the all values taking all the group together.
• The chisquare table is used to get table value at 5% level of significane
• if Hstat is < Htable value ; reject H0

Example
Sample 1 Sample 2 Sample 3
8
10
9
12
11
13
10
9
13
14
9
16
13
8
9
13
17
15

• H0: The average values in the three groups are the same
• H1: The average values in the three groups are the different
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17
Tied
rank
1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18

• H = {12/18x19 [ (45.252 /6 ) + (61.52/6) + (65.52/6 )]} – 3x19
• Hcalculated = 56.99 , 𝑥2
table value = 5.99
• Hstat > 𝑥2
table value ; Reject H0
Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3
8
10
9
12
11
13
1.5
7.5
4.75
10
9
12.5
10
9
13
14
9
16
7.5
4.75
12.5
15
4.75
17
13
8
9
13
17
15
12.5
1.5
4.75
12.5
18
16
𝛴 = 45.25 𝛴 = 61.5 𝛴 = 65.25

WAYS TO RULE OUT ALTERNATIVE EXPLANATIONS
FOR OUTCOMES BY USING STATISTICAL ANALYSIS
• Testing null hypothesis
• Determining the probability of type I and type II error
• Calculating and reporting tests of effect size
• Ensuring data meet the fundamental assumptions of the statistical
test

TESTING NULL HYPOTHESIS
• When attempting to determine if an outcome is related to a cause, it is necessary to
know if the outcomes or results could have occurred by chance alone.
• This cannot be done with certainity, but researchers can determine the probability that
the hypothesis is true.
• Accepting a null hypothesis is a statement that there are no differences in the
outcomes based on the intervention or observation(that is, there is no cause and effect
relationship).
• Using a null hypothesis enables the researcher to quantify and report the probability
that the outcome was due to random error.

DETERMINING THE PROBABILITY OF
TYPE I AND TYPE II ERROR
• Before accepting the results as evidence for practice, however
the probability that an error was made should be evaluated.
• This coupled with the results of the hypothesis test, enables
the researcher to quantify the role of error in the outcome.
• The relationship between Type I and Type II error is
paradoxical – as one is controlled, the risk of other increases.
• Both types of error should be avoided

CALCULATING AND REPORTING TESTS
OF EFFECT SIZE
• Effect size refers to how much impact the intervention
or variable is expected to have on the outcome.
• Large effect sizes enhance the confidence of the
findings. When a treatment exerts a dramatic effect,
then the validity of the findings is not so called into
question.
• On the other hand, when effect sizes are very small,
then the potential for effects from extraneous

ENSURING DATA MEET THE FUNDAMENTAL
ASSUMPTIONS OF THE STATISTICAL TEST
• Data analysis is based on many assumptions about the
nature of the data, the statistical procedures that are used
to conduct the analysis and the match between the data
and the procedure
• If assumption is violated, the result can be an inaccurate
estimate of the real relationship.
• In accurate conclusions lead to an error, which in turn
affects the validity of a study.

RESOURCES FOR STATISTICAL
ANALYSIS PROGRAM
• Packaged computer programs can perform the data analysis
and provide with the results of analysis on a computer
printout.
• SPSS, SAS and Biomedical Data Processing (BMDP)
• If the analysis selected are inappropriate for the data, the
computer program is often unable to detect that error and
proceed to perform the analysis

STATISTICALANALYSIS SYSTEM
Comprehensive software developed by North Carolina University.
This software is divided into many modules and its licensing is
flexible, based upon the need for functions.
This system contains a very large variety of statistical methods and is
the software of choice of many major businesses, including the entire
pharmaceutical industry.
 SAS has also developed a PC SAS, which is compatible with the
personal computer and has a user-friendly windows interface.

PITFALLS OF STATISTICAL ANALYSIS
• Statistics can be used, intentionally or unintentionally, to reach faulty
conclusions. Misleading information is unfortunately the norm in
advertising. The drug companies, for example, are well known to
indulge in misleading information.
• Data dredging
• Survey questions
It is therefore important that to understand not just the numbers but
the meaning behind the numbers. Statistics is a tool, not a substitute
for in-depth reasoning and analysis

APPLICATION OF STATISTICAL ANALYSIS
IN NURSING FIELD
• To analyze a trend in the vital statistics of a particular patient.
• Research in nursing processes and procedures
• A statistical analysis of patient outcomes
• Trends in nursing

JOURNAL ABSTRACT
Use of Statistical Analysis in The New England Journal of Medicine
• A sorting of the statistical methods used by authors of the 760 research
and review articles in Volumes 298 to 301 of The New England Journal of
Medicine indicates that a reader who is conversant with descriptive
statistics (percentages, means, and standard deviations) has statistical
access to 58 per cent of the articles. Understanding t-tests increases this
access to 67 per cent.
• The addition of contingency tables gives statistical access to 73 per cent of
the articles.
• Familiarity with each additional statistical method gradually increases the
percentage of accessible articles.
• Original Articles use statistical techniques more extensively than other
articles in the Journal.

Statistical analysis and design in marketing
journal articles
• The use of statistical analysis in 922 articles from the 1980 through 1985
issues of the Journal of The Academy of Marketing Science (JAMS), the
Journal of Marketing (JM), the Journal of Marketing Research (JMR), and
the Journal of Consumer Research (JCR) was analyzed.
• A reader with no statistical background can understand 31, 56, 9, and 21
percent of the articles respectively in these four journals.
• Knowledge of regression and analysis of variance is important in
comprehending many of the articles.
• 38 percent of the JAMS articles and 25, 57 and 56 percent, respectively, of
the other three journals make use of these statistical techniques.

ASSIGNMENT
• Mean and Standard deviation of weight (Kg) of 100 School going(A) and
100 children not going to school(B) of 5 years of age in slum areas are given
below
Which test is used to find the statistical significance?
Population Sample
size
Mean SD
A 100 17.4 3
B 100 13.2 2.5

REFERENCES
• Indrayan A. Basic methods of medical research. NewDelhi: AITBS
Publishers; 2006.
• Kader P . Nursing Research: Principles, process and issues. Second
edition. Newyork : Palgrave Macmillan; 2006.
• Sundaram RK, Dwivedi SN, Sreenivas V. Medical Statistics : Principles
and methods. Second edition. New Delhi: Wolter Kluwer publication;
2015
• Rao SSSP. Biostatistics. Third edition. New Delhi: Prentice Hall India
Pvt Ltd;2004

Statistical analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Statistical analysis

Similar to Statistical analysis (20)

More from Princy Francis M

More from Princy Francis M (20)

Recently uploaded

Recently uploaded (20)

Statistical analysis