The presentation discusses the concept of test of significance including the test of significance examples of t-test, proportion test and chi-square test.
Test of significance (t-test, proportion test, chi-square test)
1. Concept of Test of Significance
(t-test, proportion test and Chi-square test)
By
Dr. Ramnath Takiar
Ex-Director Grade Scientist,
National Cancer Registry Programme
(Indian Council of Medical Research)
October 23, 2014
2. Concept of Test of significance:
Tests of statistical significance are invariably applied
now-a-days by research scientists. Good medical journals
refuse to accept papers for publication if the authors
have not used the philosophy of significance testing in
evaluating their results.
It is not adequate to mechanically undertake significance
tests , the scientists/experimental workers must fully
understand the basic concepts underlying a significance
test, the assumptions involved and the limitations, for
making the proper interpretations.
3. Generally, the existence of statistical significance
difference is regarded as a proof of the existence of an
important difference between two sample results.
Similarly, the non-significant differences are regarded as
proof of no differences in two sample results.
To properly appreciate the role of significance testing, it
is important to understand first the concept of sampling
fluctuation.
4. Concept of Sampling fluctuation:
Mean 10.4 Lower limit = Mean-1.96 SE
Upper limit = Mean+1.96 SE
Value
1 2 3
SE
Sample
Mean
1 10 13 9 10.7 1.20 8.31 13.02
2 10 13 12 11.7 0.88 9.94 13.40
3 10 13 8 10.3 1.45 7.49 13.18
4 10 9 12 10.3 0.88 8.60 12.06
5 10 9 8 9.0 0.58 7.87 10.13
6 10 12 8 10.0 1.15 7.74 12.26
7 13 9 12 11.3 1.20 8.98 13.69
8 13 9 8 10.0 1.53 7.01 12.99
9 13 12 8 11.0 1.53 8.01 13.99
10 9 12 8 9.7 1.20 7.31 12.02
Please note that for confidence limit, I have used Normal values, while based on number of observations (6), t-values
can be used. The use of Normal values was done for easy understanding.
Lower
Limit
Upper
limit
Mean (means) = 10.4; SE= SD(means) = 0.798
5. T-test
To test that the Population Mean is M
(Observed Mean – Population Mean )/ Standard error
≈ t (n-1)
where (n-1) is the degrees of freedom (df)
= (10.7 – 10.4)/1.20 = (0.3/1.20) = 0.25
(t =4.30 at 2 d.f. from the table) => P>0.05
= (11.7 - 10.4) / 0.88 = (1.3/0.88) = 1.48
= (9.0 – 10.4) /0.58 = (1.4/0.58) = 2.41
6. Null Hypothesis: It is a definite statement about the
population parameter. Such a hypothesis is of no
difference, is called null hypothesis and usually denoted by
Ho. According to Prof. R.A. Fisher, null hypothesis is the
hypothesis which is tested for its possible rejection under
the assumption that it true.
Alternative Hypothesis: Any hypothesis which is
complimentary to the null hypothesis is called an alternative
hypothesis usually denoted by H1.
Type I Error: Reject Ho when it is true.
Type II Error: Accept Ho when it is not true.
P(Reject Ho when it true) = P(Reject Ho/Ho) = a
P(Accept Ho when it is not true) = P(Accept Ho/H1) = b
Conventionally, the following values of errors are accepted:
a = 0.05; b = 0.90
7. Examples of t-test:
1. For a random sample of 10 persons , fed on diet A, the
increased in weight in Kgs for a certain period were:
10, 6,16,17, 13, 12, 8 14,15,9 (mean= 13.4; SE= 1.634)
Test whether the gain in weight, on an average is 15.0
Kgs.
Ho = Sample mean is not different from population mean
(15.0) or mean gain in weight = 15.0 kgs.
H1 = Sample mean is different from population
mean (¹ 15.0) or Mean gain in weight ¹ 15.0 kgs
t = (Sample mean – Population mean)/ SE of mean
= ( 13.4-15.0)/1.634
= - 0.979 (df = 9)
= P=0.35 or P > 0.05
Hence, Ho is accepted => the sample mean = 15.0
8. 2. The manufacturer of a certain make of electric bulb
claims that his bulbs have a mean life of 25 months. A
random sample of 6 such bulbs gave the following
values: 24, 26,30, 20, 20, 18 (mean= 23.0; SE= 1.844)
can you regard the producers claim to be valid ?
Ho = Sample mean is not different from population mean
(25.0) or mean life of bulb = 25.0 months
H1 = Sample mean is different from population
mean (¹ 25.0) or Mean life of bulb ¹ 25.0 months
t = Sample mean – Population mean)/ SE of mean
= ( 23.0-25.0)/1.428
= 1.085 (df = 5)
= P=0.33 or P > 0.05
Hence, Ho is accepted => the mean life of bulb = 25.0
months
9. T test- Two means:
t test = (mean1-mean2)/S* (1/n1 + 1/n2)^0.5
Where S is Pooled SD of Sample 1 and Sample2 and can
be given by the following formula:
S = [(n1-1) (SD1)^2+ (n2-1)*(SD2)^2/(n1+n2-2)]^0.5
In above formula, the degrees of freedom =
df = (n1+n2-2)
10. Examples of T test- Two means:
3. Below are given the gain in weights (Kgs) of preschool
children fed for certain period of time on two diets A and B
Diet A: 2.5, 3.2, 3.0, 3.4, 2.4, 1.4, 3.2, 2.4, 3.0, 2.5
Diet B: 4.4, 3.4, 7.2, 1.0, 4.7, 3.1, 4.0, 3.2, 3.5, 1.8, 2.1, 2.9
Test if two diets differ significantly as regards their effect on
increase in weight.
Diet A: MeanA = 2.7; Diet B Mean = 3.45
SD A = 0.59 SD B = 1.59
N1 = 10 N2 = 12
Pooled SD2 = {(N1-1)* SDA2 + (N2-1)* SDB2 } /(N1+N2-2)
o = [(10-1)*0.592 + (12-1)* 1.592 ]/(10+12-2)
= [3.12 + 27.87]/20 = 1.55
(1/N1) = 1/10 = 0.1; (1/N2) =(1/12) = 0.083
11. (MEAN1-MEAN2)/{S2 * [(1/N1)+(1/N2)]}^0.5
(3.7 – 2.45)/[1.55*[(0.1)+(0.083)]]^0.5 = 0.75/ (0.284)^0.5
= 0.75/0.533 = 1.39 df = 20
= P = 0.18 => P > 0.05 ; Hence Ho is accepted.
Diet A and Diet B are equally effective in producing the weight
gain among preschool children.
------------------------------------------------------------------------------------
4. A reading test is given to two different sections of the same
class. The results of the test are
Section A : Mean = 75 Section B: Mean=65
SD = 8 SD = 10
N1= 12 N2 = 15
Is the difference between the means of the two section is
significant?
Mean1-Mean2 = 10; S2 = 84.16 ; (1/N1+1/N2) = (0.083+0.067)
t = 2.81 ; df = 25; P = 0.009 => The difference between the
mean scores are different.
12. Some more examples of t test applications:
5. Measurements of the fat content of two kinds of Ice cream,
Brand A and Brand B yielded the following sample data:
Brand A : 13.5, 14.0, 13.6, 12.9, 13.0
Brand B : 12.9, 12.5, 11.5, 10.0, 10.0
Test whether the fat contents of ice cream of both the
brands are comparable.
6. Two independent groups of 10 children were tested to find
how many digits they could repeat from memory after
hearing them. The results are as follow:
Group A : 8 6 5 7 6 8 7 4 5 6
Group B: 10 6 7 8 6 9 7 6 7 7
Is the difference between the mean scores of two groups
are significant?
13. Paird t test : Examples
7. Eleven school boys were given a test in Statistics. They were
given a month’s tuition and a second test was held at the
end of it. Do the marks give evidence that the students have
gained from the extra coaching?
Marks in I test : 23 20 19 21 18 20 18 17 23 16 19
Marks in II test: 24 19 22 18 20 22 20 20 23 20 18
8. A drug was administered to 10 patients and the increments
in their blood pressure were recorded to be
6 3 -2 4 -3 4 6 0 3 2
Is it reasonable to believe that drug has no effect on
change of blood pressure?
_
t = diff(mean)/ SE of diff = d / SE(d) with (n-1) df.
where d is the difference between the observations.
14. STATISTICS SCORE
Sl. No. Test I Test 2 Diff.
Change
in BP
1 23 24 1 6
2 20 19 -1 3
3 19 22 3 -2
4 21 18 -3 4
5 18 20 2 -3
6 20 22 2 4
7 18 20 2 6
8 17 20 3 0
9 23 23 0 3
10 16 20 4 2
SUM 13 23
MEAN 1.3 2.3
SD 2.11 3.09
SE 0.70 1.03
15. Variable
Statistics
score
Change in BP
MEAN 1.3 2.3
SE 0.704 1.031
t test 1.85 2.23
P vluae 0.098 0.053
tabulated
2.26
value (9 df)
16. Conclusions:
1. Ho accepted =>
The mean marks obtained after the tuition are not
significantly different from that obtained before the tuition.
Extra coaching among the students has not resulted in
improving their test scores.
2. Ho is accepted =>
The increments in blood pressure was not significantly
different from zero.
The drug has no effect on change of blood pressure.
17. It should be noted that the t-test which we have discussed is
often used
1. To test whether the sample mean is significantly different
from a hypothetical mean or not?
2. To test whether two sample means are comparable or
different?
3. When the number of observations for selected sample(s)
is(are) small say below 30.
Assumptions used in application of t-test:
1. The population from which the sample(s) is (are) drawn,
follows normal distribution.
2. In case of comparison between two sample means, it is
assumed that both the samples are drawn from normal
population and their variances are comparable.
18. It should be noted that when n is large (above 30) we use
Normal test instead of t-test. The Normal test could be used
For testing of single mean and for comparison of two sample
means.
The formulae remain same as that of t-test with the
difference that Normal table is used instead of t-table to get
the tabulated value.
Critical values of Z statistic
Level of significance
1% 5% 10%
Critical value
Two tailed 2.58 1.96 1.645
One tailed 2.33 1.645 1.28
19. Test for Proportion of Successes:
Instead of dealing with number of successes, very often we
may be interested in proportion of success obtained in an
experiment that is the number of successes are divided by
the total number of trials made. Therefore,
p = probability of success in each trial;
q = (1-p) = probability of failure.
n = size of sample; then SE(p) = √pq/n
Z = (p –P) / √(PQ/n) where p = sample proportion &
P = Population proportion
= (p-P)/ √(pq/n)
= (sample p – Population p)/SE(p)
20. Example 9: In a simple random sample of 600 men taken from
a big city 400 are found to be smokers. Test the hypothesis
that 60% of the men are smokers.
Ho: The 60% of men are smokers. (Given P = 0.6)
H1: The percentage of smokers is not equal to 60% (P¹ 0.6)
Proportion of smokers = p = 400/600 = 0.666
SE(p) = √pq/n = (0.666*0.334/600)^0.5 = (0.000371)^0.5=
= 0.0193
Z = (p-P)/ SE(p) = (0.666-0.6)/SE = 0.066/0.0.0193 = 3.42 ;
The difference between observed and expected proportion of
smoker is more than 1.96 SE (5% level of significance) .
Hence our hypothesis is rejected and we conclude that the the
proportion of smokers in the city is greater than 60%.
21. Example 10: 500 subjects were surveyed for their dental
hygiene and 30 of them were found to be with dental
problems. Test the hypothesis that proportion of dental
problems in population is not different from 5%.?
Ho: The dental problem in population is 5%. (Given P=0.05)
H1: The dental problem in population is different from 5%. .
P ¹ 0.05
p = 30/500 = 0.06 ; SE(p) = √pq/n
SE (p) = (0.06*0.94/500)^0.5 = 0.0106
Z = (p – P)/ √pq/n =(0.06-0.05)/0.0106
= 0.01/0.0106 = 0.94
Since Z calculated is less than 1.96, we accept Ho.
ÞThe dental problems in population is not different from 5%.
22. Test for difference between Proportions:
If two samples are drawn from different populations, we may
be interested in finding out whether the difference
between the proportion of successes is significant or not. In
such a case we take the hypothesis that proportion of
success in one sample (p1) and success in another sample
(p2) is due to fluctuations of random sampling.
Z = (p1-p2)/√(pq(1/n1+1/n2)) where
p1 = proportion in sample 1.
p2 = proportion in sample 2.
n1 = sample size of sample 1
n2 = sample size of sample 2.
p = (n1*p1+n2*p2)/(n1+n2)
23. 11. In a random sample of 100 men taken from a village A, 60
are found to be consuming alcohol. In another sample of 200
men taken from village B, 100 were found to be consuming
alcohol. Do the two villages differ significantly in respect of
their consuming alcohol?
Given p1=60/100 = 0.6; n1=100
p2= 100/200 = 0.5; n2=200
Ho = p1=p2 ; H1= p1¹p2
P = (n1*p1+n2*p2)/(n1+n2) = (100*0.6+200*0.5)/(100+200)
= (60+100)/300 = 160/300 = 0.53
Z = (p1-p2)/√(pq(1/n1+1/n2))
= (0.6-0.5)/ √(0.53*0.47(1/100+1/200)
= (0.1/ √0.249(0.01+0.005) =0.1/ √0.003737
= 0.1/0.6112 = 1.63
Since calculated Z is less than 1.96, we accept Ho.
=> The percentage of alcohol consumers are comparable bet.
Two villages.
24. 12. In a large city A 25% of a random sample of 900 school
boys has defective eyesight. In another large city B, 20% of a
random sample of 1600 had the same defect. Is this
difference between the two proportion significant?
Given p1=0.25; n1=900
p2= = 0.20; n2=1600
Ho = p1=p2 ; H1= p1¹p2
P = (n1*p1+n2*p2)/(n1+n2) = (900*0.25+1600*0.2)/
(900+1600)
= (225+320)/2500 = 545/2500 = 0.218
Z = (p1-p2)/√(pq(1/n1+1/n2))
= (0.25-0.2)/ √(0.22*0.78*(1/900+1/1600)
= (0.05/ √0.172(0.001+0.0006) =0.05/ √0.000298
= 0.05/0.01726 = 2.89
Since calculated Z is more than 1.96, we reject Ho.
=> The percentage of eye sight problem is different between
the cities. City A has more eyesight problem.
25. Chi-square test:
Very often in the field of research, we come across
qualitative types of data like presence or absence of a
symptom, classification of a pregnancy as ‘high risk’ or
‘low risk’ , the degree of severity of a disease ( mild,
moderate, severe). When we are interested in tabulating
such type of data for more than one group and want
meaningful comparisons then a method which is useful in
such situations is Chi-square test.
The Chi-square test is designed to examine whether a series
of observed numbers in various categories of the data are
consistent with the numbers expected in those categories
on some specific hypothesis.
26. In practice, there will be some differences between the
observed (O) and expected (E) numbers in each category
and our aim is to derive a single quantity to conclude
whether the variation seen is genuine or due to sampling.
The Chi-square is defined as Σ (O-E)2/E with (n-1) df.
13. In a hospital, 480 female and 520 male babies were born
in a week. Do these figures confirm the hypothesis that
males and females are born in equal number?
Ho: The male and female babies are born in equal
proportions
H1: The male and female babies are not born in equal
proportions.
Under Ho male = female = (480+520)/2 = 1000/2 = 500
27. X2 = Chi-square = Σ (O-E)2/E with (n-1) df.
= (480-500)2/500 + (520-500)2/500
= (20)2/500 + (20)2/500
= 400 /500 + 400/500 = 0.8+0.8 =1.6
Tabulated values of Chi-square by degree of freedom
Degress of
freedom
1 2 3 4 5
Value 3.84 5.99 7.82 9.49 11.07
Since 1.6 is less than 3.84, we accept Ho.
=> Male and female babies are born in equal proportions.
28. 14. A pharmaceutical company claimed that a new
product introduced by them can cure 80% of the
patients with a particular disease in seven days. In an
experiment conducted to test this claim, it was
observed that among 80 patients with the disease,
only 56 (70%) were cured within the stipulated time.
Can we conclude that the company’s claim is
exaggerated?
Cured Not cured Total
Observed 56 24 80
Expected 64 16 80
X2 = (56-64)2/64 + (24-16)2/16 = 64/64+ 64/16 =5; df=1
Since Calculated Chi square is more than 3.84, we
conclude that the claim is exaggerated.
29. 15. Consider a controlled clinical trial in which 90 of 100
patients received treatment A got cured compared
with 105 of 150 who received Treatment. Test the
hypothesis that Treatment A is more effective than
Treatment B?
Response to treatment
Cured Not cured Total (Rj)
TREATMENT A 90 (a11) 10 (a12) 100 (R1)
TREATMENT B 105 (a21) 45 (a22) 150 (R2)
Total (Cj) 195 (C1) 55 (C2) 250
Ho = Both the treatments are equally effective
H1 = Treatment A is more effective than B
e11 = R1*C1/T = 195*100/250 = 78
e12 = R1*C2/T = 55*100/250 = 22
e21 = R2*C1/T = 195*150/250 = 117
e22 = R2*C2/T = 55*150/250 = 33
30. Having known the Expected values, the Chi-square can be
calculated as follows:
X2 = (90-78)2/78 + (10-22)2/22 + (105-117)2/117 + (45-33)2/33
= 1.85 + 6.55 + 1.23 + 4.36 = 13.99; df=1
In general the formula for calculation of df is
= (m-1)*(n-1) where m = no. of rows; n= no. of columns
Since calculated chi-square is more than tabulated value
(3.84), we decide to reject Ho.
ÞThe cure rate is different between Treatment A and
Treatment B.
ÞTreatment A is better in respect of cure rate.
31. Chi-square test for association:
16. In a college, 1072 students were classified according to
their intelligence and economic conditions. Test whether
there is any association between intelligence and economic
conditions.
Classification of students according to their economic condition and
Economic
condition
intelligence
Excellent Good Mediocore Dull Total
Good 48 199 181 82 510
Not good 81 185 190 106 562
Total 129 384 371 188 1072
Ho : There is no association between economic condition and
Intelligence.
H1: There is association bet economic cond. and Intelligence
32. Calculations of Expected values and Chi-square
e11 = R1*C1/T = 129*510/1072 = 61.4 o11 = 48
e12 = R1*C2/T = 384*510/1072 = 182.7 o12 = 199
e13 = R1*C3/T = 371*510/1072 = 176.5 o13 = 182
e14 = R1*C4/T = 188*510/1072 = 89.4 o14 = 82
e21 = R1*C1/T = 129*562/1072 = 67.6 o21 = 81
e22 = R1*C2/T = 384*562/1072 = 201.3 o22 = 185
e23 = R1*C3/T = 371*562/1072 = 194.5 o23 = 190
e24 = R1*C4/T = 188*562/1072 = 98.6 o24 = 106
Using the formula X2 = Chi-square = Σ (O-E)2/E with (n-1) df.
We get X2 = 9.735; df = (2-1)*(4-1)=1*3 =3 df. (p=0.0209)
Since calculated chi-square is greater than the tabulated value ( 7.82), Ho is
rejected => there is association bet Intelligence and Economic condition
33. The following data are for a sample of 300 car owners who
were classified with respect to age and the number of
accidents they had during the past two years. Test whether
there is any relationship between these two variables.
Accidents
0 1-2 3 or more Total
Age group
<21 8 23 14 45
22-26 21 42 12 75
>= 27 71 90 19 180
Total 100 155 45 300
34. Conditions for the Validity of Chi-square test:
The sample observations should be independent.
The constrains on cell frequency, if any, should be linear
e.g., Σ Oi = Σ Ei.
The total frequency should be reasonably large say
greater than 50.
No theoretical cell frequency should be less than 5.
If any theoretical cell frequency is less than 5, then for the
application of Chi-square test, it is pooled with preceding or
succeeding frequency so that pooled frequency is more than
5 and finally adjust for d.f. lost in pooling.
35. Chi-square test should be applied only to frequencies
and not to percentages or prortions.
Chi-square test depends only on the set of observed
and expected frequencies and on degree of freedom.
It does not make any assumption s regarding the
population from which the observations are drawn.
Hence , termed as non-parametric test.