1. Quantitative Research Methods
Lecture 10
Nonparametric Statistics
1. Wilcoxon Rank Sum Test
2. Sign Test
3. Kruskal-Wallis Test
4. Friedman test
5. Spearman rank correlation
2. Statistical analyses
ā¢ Group differences (nominal variable) on one interval
variable:
ā« T-tests (2 groups)
ā« ANOVA (3 or more groups)
ļ One factor: one way ANOVA
ļ Two factor: two way/factor ANOVA
ā¢ The relationship between two nominal variable:
ā« Chi-square test
ā¢ The relationship between two interval variable:
ā« Correlation, simple linear regression
ā¢ The relationship between multiple interval variable on one
interval variable
ā« Multiple regression
ā¢ The relationship between multiple interval variable on one
nominal variable (yes/no)
ā« Logistic regression
3. 19.3
Nonparametric Tests
So far all statistics are dealing with nominal and interval
data.
Ordinal data is the result of a rating system such as
Excellent, good, fair, and poor
We can record the responses using any numbering system
as long as the order is maintained. For example,
Excellent = 4
Good = 3
Fair = 2
Poor = 1
5. 19.5
Ordinal Dataā¦
The difference between interval and ordinal data is that with
interval data the differences are meaningful and consistent. With
ordinal data the differences between values has no meaning. For
example, what is the difference between Excellent and Good.
Is it
4-3 = 1 ?
or
85-40 = 45 ?
The answer is neither. All we can say about the difference
between Excellent and Good is that Excellent is ranked higher.
We cannot interpret the magnitude of the difference.
6. 19.6
Nonparametric Statistics
When the data are ordinal, the mean is not an
appropriate measure of central location.
Instead, we will test characteristics of populations
without referring to specific parameters, hence the
term nonparametric.
Although nonparametric methods are designed to test
ordinal data, they have another area of application.
The statistical tests described before require that the
populations be normally distributed.
7. 19.7
Nonparametric Statistics
If the data are extremely nonnormal, the t-tests are
invalid.
Nonparametric techniques can be used instead.
For this reason, nonparametric procedures are
often (perhaps more accurately) called
distribution-free statistics.
8. 19.8
Nonparametric Statistics
In such circumstances we will treat the interval
data as if they were ordinal.
For this reason, even when the data are interval
and the mean is the appropriate measure of
location, we will choose instead to test
population locations.
9. Group differences (nominal variable) on one
interval variable:
T-tests (2 groups)
Independent T-test
Paired T-test
ANOVA (3 or more groups)
One factor: one way ANOVA
Two factor/ Two way (blocks) ANOVA
The relationship between two nominal
variable:
Chi-square test
The relationship between two interval
variable:
Correlation (Pearson), simple linear
regression
The relationship between multiple interval
variable on one interval variable
Multiple regression
The relationship between multiple interval
variable on one nominal variable (yes/no)
Logistic regression
Group difference on Ordinal, nonnormal
interval variable:
2 groups : Wilcoxon Rank Sum Test (Independent)
Matched Pairs: Sign Test (ordinal)
Wilcoxon sign test (interval)
3 or more groups:
Kruskal-Wallis Test
Friedman test (blocks)
The relationship between two Ordianl,
nonnormal variables
Spearman rank correlation
Parametric Nonparametric
Parametric and Nonparametric
10. 19.10
Example 19.2
A pharmaceutical company is planning to introduce a new
painkiller. In a preliminary experiment to determine its
effectiveness, 30 people were randomly selected, of whom 15
were given the new painkiller and 15 were given aspirin. All 30
were told to use the drug when headaches or other minor pains
occurred and to indicate which of the following statements most
accurately represented the effectiveness of the drug they took.
5 = The drug was extremely effective.
4 = The drug was quite effective.
3 = The drug was somewhat effective.
2 = The drug was slightly effective.
1 = The drug was not at all effective.
11. 19.11
Example 19.2
The responses are listed here (and stored in Xm19-02)
using the codes. Can we conclude at the 5% significance
level that the new painkiller is perceived to be more
effective?
New painkiller: 3, 5, 4, 3, 2, 5, 1, 4, 5, 3, 3, 5, 5, 5, 4
Aspirin: 4, 1, 3, 2, 4, 1, 3, 4, 2, 2, 2, 4, 3, 4, 5
13. 19.13
Example 19.2
The problem objective is to compare two populations.
The data are ordinal and the samples are independent.
The appropriate technique is the Wilcoxon rank sum
test.
If the drug is effective, weād likely see its location āto
the right ofā the location of aspirin users, hence:
H1: The location of population 1 is to the right
of the location of population 2, and so:
H0: The two population locations are the same.
14. 19.14
Example 19.2
(though not shown here) The rank sum for the new
painkiller is T1=276.5, and the rank sum for
aspirin: T2=188.5
Set T= T1=276.5, and begin calculatingā¦
15. 19.15
Example 19.2
The p-value of the test is:
p-value = P(Z > 1.83) = .5 - .4664 = .0336
(or Z=1.83 > ZĪ± = Z.05 =1.645), hence:
āThere is sufficient evidence to infer that
the new painkiller is perceived to be more
effective than aspirinā
19. 19.19
Tests for Matched Pairs Experiments
We will now look at two nonparametric techniques (Sign
Test and Wilcoxon Signed Rank Sum Test) that test
hypotheses in problems with the following characteristics:
ā We want to compare two populations,
ā The data are either ordinal or interval (nonnormal),
ā and the samples are matched pairs.
As before, weāll compute matched pair differences and work
from thereā¦
20. 19.20
Example 19.4
In an experiment to determine which of two cars is perceived to
have the more comfortable ride, 25 people rode (separately) in
the back seat of an expensive European model and also in the
back seat of a North American midsize car. Each of the 25 people
was asked to rate the ride on the following 5-point scale.
1 = Ride is very uncomfortable.
2 = Ride is quite uncomfortable.
3 = Ride is neither uncomfortable nor comfortable.
4 = Ride is quite comfortable.
5 = Ride is very comfortable.
The results are stored in Xm19-04. Do these data allow us to
conclude at the 5% significance level that the European car is
perceived to be more comfortable than the North American car?
22. 19.22
Example 19.4
The problem objective is to compare two populations. The
data are ordinal and the experimental design is matched
pairs. Thus the correct technique is the sign test. Because we
want to test whether there is enough evidence to infer that
the European car is perceived to have a smoother ride than
the North American car the hypotheses are
H0 :The two population locations are the same.
H1 : The location of population 1 (European car rating) is
to the right of the location of population 2 (North American
car rating)
26. 19.26
Example 19.4
There is enough evidence to infer that the European car
is perceived to have a smoother ride than the North
American car the hypotheses are supported.
27. 19.27
Checking the Required Conditions
The sign test requires:
ļ¼ The populations be similar in shape and
spread:
Sign Test required conditions:
1. Problem objective: compare two populations. the
two populations be identical in shape and spread.
2. Ordinal data
3. Matched pairs.
ļ¼ The sample size exceeds 10 (n=23).
0
5
10
1 2 3 4 5
Frequency
North American Car Rating
Histogram
0
5
10
1 2 3 4 5
Frequency
European Car Rating
Histogram
28. 19.28
Example 19.5
Traffic congestion on roads and highways costs industry
billions of dollars annually as workers struggle to get to and
from work.
Several suggestions have been made about how to improve
this situation, one of which is called flextime, which
involves allowing workers to determine their own schedules
(provided they work a full shift).
Such workers will likely choose an arrival and departure
time to avoid rush-hour traffic.
29. 19.29
Example 19.5
In a preliminary experiment designed to investigate such a
program the general manager of a large company wanted to
compare the times it took workers to travel from their homes to
work at 8:00 A.M. with travel time under the flextime program.
A random sample of 32 workers was selected. The employees
recorded the time (in minutes) it took to arrive at work at 8:00
A.M. on Wednesday of one week.
The following week, the same employees arrived at work at times
of their own choosing.
The travel time on Wednesday of that week was recorded.
30. 19.30
Example 19.5
These results are listed in the Xm19-05. Can we
conclude at the 5% significance level that travel
times under the flextime program are different
from travel times to arrive at work at 8:00 A.M.?
32. 19.32
Wilcoxon Signed Rank Sum Test
Weāll use Wilcoxon Signed Rank Sum test when
we want to compare two populations of interval (but
not normally distributed) date in a matched pairs type
experiment.
j Compute paired differences, discard zeros.
k Rank absolute values of differences smallest (1)
to largest (n), averaging ranks of tied observations.
l Sum the ranks of positive differences (T+) and of
negative differences (Tā).
m Use T=T+ as our test statisticā¦
33. 19.33
Example 19.5
The appropriate technique is the Wilcoxon signed rank
sum test. Because we want to know whether the
population locations differ we have
H0: The two population locations are the same.
H1: The two population locations are different
This is a two-tail test.
34. 19.34
Example 19.5
The Original Data
ranks of +ve differencesā¦
ranks of -ve differencesā¦
Rank Sums
Sorted ascending by |difference|
38. 14.38
Kruskal-Wallis Test
ā¢ Checking the Required Conditions of ANOVA
The F-test of the analysis of variance requires that the
random variable be normally distributed with equal
variances.
ā¢ If the data are not normally distributed we can
replace the one-way analysis of variance with its
nonparametric counterpart, which is the
Kruskal-Wallis test.
39. 19.39
Kruskal-Wallis Test
The Kruskal-Wallis test is applied to problems
where we want to compare two or more
populations of ordinal or nonnormal interval data
from independent samples.
Our hypotheses will be:
H0: The locations of all k populations are the same.
H1: At least two population locations differ.
40. 19.40
Test Statistic
In order to calculate the Kruskal-Wallis test statistic, we
need to:
j Rank all the observations from smallest (1) to largest (n),
and average the ranks in the case of ties.
k We calculate rank sums for each sample: T1, T2, ā¦, Tk
l Lastly, we calculate the test statistic (denoted H):
41. 19.41
Sampling Distribution of the Test Statistic:
For sample sizes greater than or equal to 5, the test statistic
H is approximately Chi-squared distributed with kā1
degrees of freedom.
Our rejection region is:
And our p-value is:
42. 19.42
Example GSS2008
Do Democrats, Independents, Republicans differ in the
number of times per week that they read newspaper?
PARTYID3: 1.Democrats, 2. Independents, 3. Republicans
NEWS: Do you read newspapersā¦
1 = Every day,
2 = Few times per week,
3 = Once per week,
4 = Less than once per week,
5 = Never.
43. 19.43
Example GSS2008
The problem objective is to compare three populations of
ordinal data (the ratings of the three shifts), and the
samples are independent. These factors are sufficient to
determine the use of the Kruskal-Wallis test. The null and
alternative hypotheses are
H0:The locations of all three populations are the same.
H1: At least two population locations differ
49. 19.49
Friedman Test
The Friedman Test is a technique used to
compare two or more populations of ordinal or
nonnormal interval data that are generated from a
randomized block experiment.
The hypotheses are the same as in the Kruskal-
Wallis test.
H0: The locations of all k populations are the same.
H1: At least two population locations differ.
50. 19.50
Friedman Test ā Test Statistic
Since this is a blocked experiment, we first rank each
observation within each of b blocks from smallest to
largest (i.e. from 1 to k), averaging any ties. We then
compute the rank sums: T1, T2, ā¦, Tk. The we calculate our
test statistic:
This test statistic is approximate Chi-squared with kā1
degrees of freedom (provided either k or b ā„ 5). Our
rejection region and p-value are:
51. 19.51
Example 19.6
The personnel manager of a national accounting firm has been
receiving complaints from senior managers about the quality of recent
hirings. All new accountants are hired through a process whereby four
managers interview the candidate and rate her or him on several
dimensions, including academic credentials, previous work experience,
and personal suitability. Each manager then summarizes the results
and produces an evaluation of the candidate. There are five
possibilities:
1 The candidate is in the top 5% of applicants.
2 The candidate is in the top 10% of applicants, but not in the top 5%.
3 The candidate is in the top 25% of applicants, but not in the top 10%.
4 The candidate is in the top 50% of applicants, but not in the top 25%.
5 The candidate is in the bottom 50% of applicants.
52. 19.52
Example 19.6
The evaluations are then combined in making the final
decision. The personnel manager believes that the
quality problem is caused by the evaluation system.
However, she needs to know whether there is general
agreement or disagreement between the interviewing
managers in their evaluations. To test for differences
between the managers, she takes a random sample of
the evaluations of eight applicants. The results are
shown below and stored in Xm19-06. What
conclusions can the personnel manager draw from
these data? Employ a 5% significance level.
56. 19.56
SPSS Output
The value of our Friedman test statistic is 12.864 and the p-
value is 0.005. Thus, there is sufficient evidence to reject
H0 in favor of H1.
It appears that the managersā
evaluations of applicants
do indeed differ
57. 19.57
Spearman Rank Correlation Coefficient
Previously we looked at the t-test of the coefficient
of correlation ( ). In many situations, one or both
variables may be ordinal; or if both variables are
interval, the normality requirement may not be
satisfied.
In such cases, we measure and test to determine
whether a relationship exists by employing a
nonparametric technique, the Spearman
rank correlation coefficient.
58. 19.58
Spearman Rank Correlation Coefficient
We are interested whether a relationship exists between the
two variables, hence the hypotheses to be tested are:
H0: = 0 (no linear pattern, hence no correlation)
H1: ā 0 (correlation; we can also do one-tail tests)
Since is a population parameter, our sample statistic is rs,
and is calculated as:
(where a and b are the ranks of x and y respectively)
[ is referred to as the Spearman correlation coefficient]
59. 19.59
Spearman Rank Correlation Coefficient
The statistic rs is approximately normally
distributed with
ā a mean of zero, and
ā a standard deviation of
Hence our standardized test statistic is:
60. 19.60
Example 19.7
The production manager of a firm wants to examine the
relationship between aptitude test scores given prior to hiring of
production-line workers and performance ratings received by the
employees 3 months after starting work. The results of the study
would allow the firm to decide how much weight to give to these
aptitude tests relative to other work-history information
obtained, including references. The aptitude test results range
from 0 to 100. The performance ratings are as follows:
1 = Employee has performed well below average.
2 = Employee has performed somewhat below average.
3 = Employee has performed at the average level.
4 = Employee has performed somewhat above average.
5 = Employee has performed well above average.
61. 19.61
Example 19.7
A random sample of 20 production workers yielded the results
listed here. Can the firm's manager infer at the 5% significance
level that aptitude test scores are correlated with performance
rating?
Employee Aptitude Test Score Performance Rating
1 59 3
2 47 2
3 58 4
4 66 3
5 77 2
. . . .
.
Xm19-07
62. 19.62
Example 19.7
The problem is weāre trying to correlate interval &
ordinal data. Weāll treat the aptitude scores as
ordinal, and apply the Spearman rank correlation
coefficientā¦
IDENTIFY
65. 19.65
SPSS output Example 19.7
INTERPRET
There is not enough evidence to believe that the aptitude test scores
and performance rating are related. This conclusion suggests that the
aptitude test should be improved to better measure the knowledge
and skill required by a production-line worker. If this proves
impossible, the aptitude test should be discarded.
67. Why do we learn?
ā¢ Post course survey
ā¢ https://dba902.wordpress.com/2018/11/14/pos
t-class/
68. Week 5 assignment
ā¢ Reading Chapter 18-19
ā¢ Assignment:
ā« P680 16.139
ā« P708 17.10
ā« P723 17.57
ā« P761 18.49
ā¢ Data sets are available on blackboard. Due on
blackboard November 20th.