3. DEFINITION
• Statistical analysis is the organisation and analysis of
quantitative or qualitative data using statistical procedures,
including both descriptive and inferential statistics.
• It’s the science of collecting, exploring and presenting
large amounts of data to discover underlying patterns and
trends.
4. DEFINITION
• Statistics is a branch of science that deals with the
collection, organisation, analysis of data and drawing of
inferences from the samples to the whole population.
• Sample is a small portion of population which truly
represents the population with respect to the study
characteristic of the population.
5. PURPOSES
• Summarize
• Explore the meaning of deviations in data
• Compare or contrast descriptively
• Test the proposed relationships in a theoretical model
• Infer that the findings from sample are indicative
• Examine causality.
• Predict or infer from the sample to a theoretical model.
6. ELEMENTS OF STATISTICAL ANALYSIS
Understand the complex relationship among the
correlates of the disease under study.
The analysis should start with simple comparison of
proportions and means.
Interpretation of result should be guided by clinical and
biological consideration.
10. MEDIAN
• When all the observations are arranged in ascending or descending
orders of magnitude, the middle one is the median.
• For raw data, If n is the total number of observations, the value of the
[
𝑛+1
2
] th item will be called median .
• if n is the even number, the mean of n/2th item and [
𝑛
2
+ 1] th item
will be median.
Example : Median of given data 10, 20, 30 is
20
11. MODE
• The Mode is the value of a series which appears most frequently than
any other .
• For grouped data,
Mode, M0 = L0 +{
𝛥1
𝛥1+𝛥2
} x c
Where, L0 is lower limit of modal class,
C is class interval
𝛥1 is difference between modal frequency and its preceding class
∆2 is difference between modal frequency and following class
frequency.
Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is
80
12. INTERQUARTILE RANGE
• The interquartile range (IQR), is a measure of statistical dispersion,
being equal to the difference between 75th and 25th percentiles, or
between upper and lower quartiles.
• IQR = Q3 − Q1.
13. Example
Interquartile range of following data 30, 20, 40, 60 , 50
• Q1 =[
𝑛+1
4
]th item = 1.5th item = 20+ 0.5 (30-20) = 25
• Q3 = 3[
𝑛+1
4
]th item = 50 +0.5x (60-50) = 55.
• IQR = 30
14. STANDARD DEVIATION
• The standard deviation is the most useful and most popular measure
of dispersion.
• The standard deviation is defined as the positive square root of the
arithmetic mean of the square of the deviations of given observations
from their arithmetic mean.
• The standard deviation is denoted by ‘𝜎 ’.
19. PARAMETRIC STATISTICAL ANALYSIS
• Most commonly used type of statistical analysis.
• This analysis is referred to as parametric statistical analysis because
the findings are inferred to the parameters of a normally distributed
populations.
• Numerical data (quantitative variables) that are normally distributed
are analysed with parametric tests.
20. ASSUMPTIONS
• The assumption of normality which specifies that the means of the
sample group are normally distributed
• The assumption of equal variance which specifies that the variances
of the samples and of their corresponding population are equal.
• The data can be treated as random samples
21. NONPARAMETRIC STATISTICAL ANALYSIS
• Nonparametric statistical analysis or distribution free techniques
• It can be used in studies that do not meet the first two assumptions.
• Most nonparametric techniques are not as powerful as their
parametric counter parts.
22. • If the distribution of the sample is skewed towards one side or the
distribution is unknown due to the small sample size, non-parametric
statistical techniques are used.
• Non-parametric tests are used to analyse ordinal and categorical data.
23. EXPLORATORY DATA ANALYSIS AND
CONFIRMATORY DATA ANALYSIS
• John Tukey
• Exploratory data analysis to obtain a preliminary indication of the
nature of the data and to search data for hidden structure or models.
• Confirmatory data analysis involves traditional inferential statistics ,
which you can use to make an inference about a population or a
process based on evidence from the study sample.
24. STATISTICAL ANALYSIS DECISION MAKING
Two group
comparison
Mean
Parametric Independent 2 sample t test
Nonparametric Mann Witney U test
Percentage Chi-Square Test
One group
comparison
Mean
Single mean One sample t test
Mean
difference
Parametric Paired t test
Non parametric Wilcoxan Signed Scale test
More than 2
group
comparison
Mean
Parametric ANOVA
Non parametric Kruskal Walli’s test
Percentage Chi square test
26. Student's t-test
• Developed by Prof.W.S.Gossett
• Student's t-test is used to test the null hypothesis that there is no
difference between the means of the two groups
• One-sample t-test
• Independent Two Sample T Test (the unpaired t-test)
• The paired t-test
27. One-sample t-test
• To test if a sample mean (as an estimate of a population mean) differs
significantly from a given population mean.
• The mean of one sample is compared with population mean
where 𝑥 = sample mean, u = population mean and S = standard
deviation, n = sample size
28. Example
A random sample of size 20 from a normal population gives a sample
mean of 40, standard deviation of 6. Test the hypothesis is population
mean is 44. Check whether there is any difference between mean.
• H0: There is no significant difference between sample mean and
population mean
• H1: There is no significant difference between sample mean and
population mean
mean = 40 , 𝜇 = 44, n = 20 and S = 6
29. • tcalculated = 2.981
• t table value = 2.093
• tcalculated > t table value ;
Reject H0.
30. Independent Two Sample T Test (the
unpaired t-test)
• To test if the population means estimated by two independent
samples differ significantly.
• Two different samples with same mean at initial point and compare
mean at the end
31. t =
𝑥1− 𝑥2
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
1
𝑛1
+
1
𝑛2
Where 𝑥1 - 𝑥2 is the difference between the means of the two groups
and S denotes the standard deviation.
32. Example
Mean Hb level of 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10,
17.5, 14.2,15 and 14.1 . Test whether there is any significant difference
between Hb values.
• H0: There is no significant difference between Hb Level
• H1: There is no significant difference between Hb level.
t =
𝑥1− 𝑥2
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
1
𝑛1
+
1
𝑛2
34. The paired t-test
• To test if the population means estimated by two dependent samples differ
significantly .
• A usual setting for paired t-test is when measurements are made on the
same subjects before and after a treatment.
where 𝑑 is the mean difference and Sd denotes the standard deviation of the
difference.
35. Example
Systolic BP of 5 patients before and after a drug therapy is
Before 160, 150, 170, 130, 140
After 140, 110, 120, 140, 130
Test whether there is any significant difference between BP level.
• H0: There is no significant difference between BP Level before and after
drug
• H1: There is no significant difference between BP level before and after
drug
37. Z test
Generally, z-tests are used when we have large sample sizes (n > 30),
whereas t-tests are most helpful with a smaller sample size (n < 30).
Both methods assume a normal distribution of the data, but the z-tests
are most useful when the standard deviation is known.
z = (x – μ) / (σ / √n)
38. ANALYSIS OF VARIANCE (ANOVA)
• R. A. Fischer.
• The Student's t-test cannot be used for comparison of three or more groups.
• The purpose of ANOVA is to test if there is any significant difference between the
means of two or more groups.
• The analysis of variance is the systematic algebraic procedure of decomposing the
overall variation in the responses observed in an experiment into variation.
• Two variances – (a) between-group variability and (b) within-group variability that
is variation existing between the samples and variations existing within the
sample.
• The within-group variability (error variance) is the variation that cannot be
accounted for in the study design.
• The between-group (or effect variance) is the result of treatment
39. • A simplified formula for the F statistic is
where MST is the mean squares between the groups and MSE is the
mean squares within groups
41. CHI-SQUARE TEST
• Tests to analyse the categorical data
• The chi-square test is a widely used test in statistical decision making.
• The test is first used by Karl pearson in 1900.
• The Chi-square test compares the frequencies and tests whether the
observed data differ significantly from that of the expected data.
42. CHI-SQUARE TEST
It is calculated by the sum of the squared difference between observed
(O) and the expected (E) data (or the deviation, d) divided by the
expected data by the following formula:
43. Example
• Attack rates among vaccinated and not vaccinated against measles
are given in the following table. Test the association between
association between vaccination and attack of measles
Groups Attacked Not attacked
Vaccinated
Not vaccinated
10
26
90
74
44. • H0: There is no significant association between vaccination and attack
of measles
• H1: There is significant association between vaccination and attack of
measles
45. • Chi square table value = 3.841 , chi square calculated value = 8.672
• 𝑥2
calculated > 𝑥2
table value ; Reject H0.
Oi Ei Oi - Ei (Oi - Ei )2 (Oi - Ei )2 /
Ei
10
90
26
74
18
82
18
82
-8
8
8
-8
64
64
64
64
3.556
0.780
3.556
0.780
𝛴 = 8.672
46. THE WILCOXON'S SIGNED RANK TEST
• Wilcoxon's rank sum test ranks all data points in order, calculates the
rank sum of each sample and compares the difference in the rank
sums.
• For testing whether the differences observed in the values of the
quantitative variable between two correlated samples (before and
after design ) are statistically different or not
• This test corresponds to the paired t test.
47. Method
• H0: There is no difference in the paired values, on an average, between the two
groups.
• H1: There is difference in the paired values, on an average, between the two
groups.
• Compute the difference between each group of paired values in the two group.
• Rank the difference from smallest, without considering the sign of difference.
• After giving ranks, the corresponding sign should be attached.
• T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T
is taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ .
• Find the W critical value from Wilcoxon’s Signed rank Table .
• if Wstat < WCritical Value; Reject H0.
48. EXAMPLE
• IQ values of 8 malnourished children of 4 years age before and after
giving some nutritious diet for 3 months are given below
Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
49. • H0: There is no difference in the paired values
• H1: There is difference in the paired values
Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
Differe
nce
-10 -20 5 -5 3 10 -10 -15
Absolu
te
differe
nce
10 20 5 5 3 10 10 15
Rank 5 8 2.5 2.5 1 5 5 7
51. • If Assuming normal distribution for the differences, test statistic is,
Z = {|T-m| -0.5} / SD
Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and
SD = √{
𝑛 𝑛+1 2𝑛+1
24
}
• If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected
52. MANN-WHITNEY U TEST
• For testing whether two independent samples with respect to a
quantitative variable come from the same population or not.
• Wilcoxon’s Rank Sum test.
• It is used to test the null hypothesis that two samples have the same
median or, alternatively, whether observations in one sample tend to
be larger than observations in the other.
• This test is alternative of t test for two independent samples
53. METHOD
H0: The average values in the two groups are the same
H1: The average values in the two groups are the different
• Let n1 is the sample size of one group and n2 is the sample size of
second group, Rank all the values in the two groups take together.
Tied values should be given same ranks.
• The ranksum of each group is taken and Ustat is calculated using
Ustat = Rank Sum - {n(n +1)/2 }.
• Both U1 and U2 is calculated and smaller value is taken as Ustat. and
Ucritical value is calculated from the Mann- Whitney U test table
• if Ustat < UCritical value; ; Reject H0.
55. • H0: The average values in the 2 treatment are the same
• H1: The average values in the 2 treatment are the different
Ustat = Rank Sum - {n(n +1)/2 }.
Ranks 1 2 3 4 5 6 7 8 9 10 11 12
Values 2 2 3 4 5 5 6 6 7 8 9 10
Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12
57. • Assuming that the ranks are randomly distributed in the two groups,
the test statisticis
Z = {|m-T| -0.5} / SD
Where T = smaller of T1 and T2.
T1 = sum of the ranks of smaller group, T2 = {(n1 +n2)(n1 +n2 +1) / 2} – T1 ,
m= mean sum of ranks { n1 ( n1 +n2+1)}/2
SD = √{
n1 x
n2
)(
n1
+
n2
+
1
12
}
• If Z is less than 1.96, H0 is accepted
• if Z>1.96 , H0 is rejected at 5% level of significance
58. KRUSKAL-WALLIS TEST
• The Kruskal–Wallis test is a non-parametric test to analyse the
variance.
• It is for the comparison among several independent samples.
• For testing whether several independent samples of a quantitative
variable come from the same population or not
• It corresponds to one way analysis of variance in parametric methods.
59. • It analyses if there is any difference in the median values of three or more
independent samples.
• The data values are ranked in an increasing order, and the rank sums
calculated followed by calculation of the test
Where n is the total of sample sizes in all the groups and Ri is the sum of the
ranks in the ith group.
60. Method
H0: The average values in the different groups are the same
H1: The average values in the different groups are the different
• Rank the all values taking all the group together.
• The chisquare table is used to get table value at 5% level of significane
• if Hstat is < Htable value ; reject H0
62. • H0: The average values in the three groups are the same
• H1: The average values in the three groups are the different
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17
Tied
rank
1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18
64. WAYS TO RULE OUT ALTERNATIVE EXPLANATIONS
FOR OUTCOMES BY USING STATISTICAL ANALYSIS
• Testing null hypothesis
• Determining the probability of type I and type II error
• Calculating and reporting tests of effect size
• Ensuring data meet the fundamental assumptions of the statistical
test
65. TESTING NULL HYPOTHESIS
• When attempting to determine if an outcome is related to a cause, it is necessary to
know if the outcomes or results could have occurred by chance alone.
• This cannot be done with certainity, but researchers can determine the probability that
the hypothesis is true.
• Accepting a null hypothesis is a statement that there are no differences in the
outcomes based on the intervention or observation(that is, there is no cause and effect
relationship).
• Using a null hypothesis enables the researcher to quantify and report the probability
that the outcome was due to random error.
66. DETERMINING THE PROBABILITY OF
TYPE I AND TYPE II ERROR
• Before accepting the results as evidence for practice, however
the probability that an error was made should be evaluated.
• This coupled with the results of the hypothesis test, enables
the researcher to quantify the role of error in the outcome.
• The relationship between Type I and Type II error is
paradoxical – as one is controlled, the risk of other increases.
• Both types of error should be avoided
67. CALCULATING AND REPORTING TESTS
OF EFFECT SIZE
• Effect size refers to how much impact the intervention
or variable is expected to have on the outcome.
• Large effect sizes enhance the confidence of the
findings. When a treatment exerts a dramatic effect,
then the validity of the findings is not so called into
question.
• On the other hand, when effect sizes are very small,
then the potential for effects from extraneous
68. ENSURING DATA MEET THE FUNDAMENTAL
ASSUMPTIONS OF THE STATISTICAL TEST
• Data analysis is based on many assumptions about the
nature of the data, the statistical procedures that are used
to conduct the analysis and the match between the data
and the procedure
• If assumption is violated, the result can be an inaccurate
estimate of the real relationship.
• In accurate conclusions lead to an error, which in turn
affects the validity of a study.
69. RESOURCES FOR STATISTICAL
ANALYSIS PROGRAM
• Packaged computer programs can perform the data analysis
and provide with the results of analysis on a computer
printout.
• SPSS, SAS and Biomedical Data Processing (BMDP)
• If the analysis selected are inappropriate for the data, the
computer program is often unable to detect that error and
proceed to perform the analysis
70. STATISTICALANALYSIS SYSTEM
Comprehensive software developed by North Carolina University.
This software is divided into many modules and its licensing is
flexible, based upon the need for functions.
This system contains a very large variety of statistical methods and is
the software of choice of many major businesses, including the entire
pharmaceutical industry.
SAS has also developed a PC SAS, which is compatible with the
personal computer and has a user-friendly windows interface.
71. PITFALLS OF STATISTICAL ANALYSIS
• Statistics can be used, intentionally or unintentionally, to reach faulty
conclusions. Misleading information is unfortunately the norm in
advertising. The drug companies, for example, are well known to
indulge in misleading information.
• Data dredging
• Survey questions
It is therefore important that to understand not just the numbers but
the meaning behind the numbers. Statistics is a tool, not a substitute
for in-depth reasoning and analysis
72. APPLICATION OF STATISTICAL ANALYSIS
IN NURSING FIELD
• To analyze a trend in the vital statistics of a particular patient.
• Research in nursing processes and procedures
• A statistical analysis of patient outcomes
• Trends in nursing
73. JOURNAL ABSTRACT
Use of Statistical Analysis in The New England Journal of Medicine
• A sorting of the statistical methods used by authors of the 760 research
and review articles in Volumes 298 to 301 of The New England Journal of
Medicine indicates that a reader who is conversant with descriptive
statistics (percentages, means, and standard deviations) has statistical
access to 58 per cent of the articles. Understanding t-tests increases this
access to 67 per cent.
• The addition of contingency tables gives statistical access to 73 per cent of
the articles.
• Familiarity with each additional statistical method gradually increases the
percentage of accessible articles.
• Original Articles use statistical techniques more extensively than other
articles in the Journal.
74. Statistical analysis and design in marketing
journal articles
• The use of statistical analysis in 922 articles from the 1980 through 1985
issues of the Journal of The Academy of Marketing Science (JAMS), the
Journal of Marketing (JM), the Journal of Marketing Research (JMR), and
the Journal of Consumer Research (JCR) was analyzed.
• A reader with no statistical background can understand 31, 56, 9, and 21
percent of the articles respectively in these four journals.
• Knowledge of regression and analysis of variance is important in
comprehending many of the articles.
• 38 percent of the JAMS articles and 25, 57 and 56 percent, respectively, of
the other three journals make use of these statistical techniques.
75. ASSIGNMENT
• Mean and Standard deviation of weight (Kg) of 100 School going(A) and
100 children not going to school(B) of 5 years of age in slum areas are given
below
Which test is used to find the statistical significance?
Population Sample
size
Mean SD
A 100 17.4 3
B 100 13.2 2.5
76. REFERENCES
• Indrayan A. Basic methods of medical research. NewDelhi: AITBS
Publishers; 2006.
• Kader P . Nursing Research: Principles, process and issues. Second
edition. Newyork : Palgrave Macmillan; 2006.
• Sundaram RK, Dwivedi SN, Sreenivas V. Medical Statistics : Principles
and methods. Second edition. New Delhi: Wolter Kluwer publication;
2015
• Rao SSSP. Biostatistics. Third edition. New Delhi: Prentice Hall India
Pvt Ltd;2004