2. Content
1. Pearson’s product moment correlation
2. Spearman rank-order correlation (Rho)
3. Phi coefficient
4. Point biserial correlation
3. Types of Correlation Coefficients
Correlation Coefficient Types of scales
Pearson’s product moment Both scales interval
Spearman rank-order Both scales ordinal
Phi Both scales nominal
Point biserial One interval, one nominal
Which formula should I use?
4. Pearson's correlation coefficient when applied to a population is
commonly represented by the Greek letter ρ (rho) and may be
referred to as the population correlation coefficient or
the population Pearson correlation coefficient.
The formula for r is:
Cov: covariance
S(x), S(y): the standard deviation of X and Y
1. Pearson’s product moment correlation
5. • The Mean is the average of the numbers.
• The Standard Deviation is just the square root of Variance.
E.g. The following data relates to Number of hours studying
and number of correct answers
1. Pearson’s product moment correlation
6. • The Mean is the average of the numbers.
Mean =
0+1+2+3+5+5+6
7
= 3,142
• Now we calculate each scores differences from the Mean.
+ The Mean is 3.1427.
+ The differences are : - 3.142, -2.142, -1.142, -0.142, 1.858, 1.858,
2.858.
1. Pearson’s product moment correlation
7. • The Variance is:
σ2
=
(−3.142)2+ (−2.142)2+ (−1.142)2+ (−0.142)2+ 1.8582+ 1.8582+ 2.8582
7
=
30.763384
7
= 4.394
• And the Standard Deviation is just the square root of Variance.
σ = 4.394= 2.096 = 2 (to the nearest score)
1. Pearson’s product moment correlation
8. • If working with raw data, the Pearson product moment
correlation formula is as follows:
1. Pearson’s product moment correlation
11. Conclusion: There is a strong, positive correlation between X and
Y. The more X is, the more Y is.
Exercise
? Find the persons coefficient of correlation between price of
studying facilities and demand from the following data. Then make
your conclusion about their relationship.
1. Pearson’s product moment correlation
12. 2. Spearman rank-order correlation (Rho)
- A measure of the strength and direction of association that exists
between two ranked variables on ordinal scale.
- Denoted by the symbol rs (or the Greek letter ρ, pronounced rho).
−1 ≤ 𝜌 ≤ 1
13. Assumption
- Two variables are either ordinal, interval or ratio.
- There is a monotonic relationship between two variables.
2. Spearman rank-order correlation (Rho)
14. 2. Spearman rank-order correlation (Rho)
English
(mark)
Math
(mark)
56 66
75 70
45 40
71 60
62 65
64 56
58 59
80 77
76 67
61 63
- Ranking Data
• The score with the highest
value should be labeled "1"
and vice versa.
16. 2. Spearman rank-order correlation (Rho)
English
(mark)
Math
(mark)
56 66
75 70
45 40
71 60
61 65
64 56
58 59
80 77
76 67
61 63
- Ranking data
• The score with the highest
value should be labeled "1"
and vice versa.
• When you have two or more
identical values in the data, you
need to take the average of
their ranks
23. 3. Phi coefficient
A. Definition
- The Phi (ϕ) statistic is used when both of the nominal variables
are dichotomous.
- The obtained value for Phi suggests the relationship between the
two variables.
24. 3. Phi coefficient
B. Formula
Formula:
VARIABLE Y
VARIABLE X
A B A+B
C D C+D
A+C B+D
D)+C)(B+D)(A+B)(C+(A
BC-AD
=
25. 3. Phi coefficient
C. Example
E.g. A class of 50 Ss are asked whether they like using the language
lab. The answer is either yes or no. The Ss are from either Japan or
Iran.
The observed values:
Then:
Japan Iran
Yes 24 8 32
No 6 12 18
30 20
D)+C)(B+D)(A+B)(C+(A
BC-AD
=
41
88.587
0
345600
0
20301832
681224
0.=
24
=
24
=
))()()((
))((-))((
=
26. 3. Phi coefficient
D. Steps
D.1. Using the suggested interpretations of Measure
of Association
1. State the Null hypothesis
2. Determine the Phi coefficient
3. Using the suggested table to state the conclusion
27. 3. Phi coefficient
Suggested Interpretations of Measures of Association
Values Appropriate Phrases
+.70 or higher Very strong positive relationship.
+.50 to +.69 Substantial positive relationship.
+.30 to +.49 Moderate positive relationship.
+.10 to +.29 Low positive relationship.
+.01 to +.09 Negligible positive relationship.
0.00 No relationship.
-.01 to -.09 Negligible negative relationship.
-.10 to -.29 Low negative relationship.
-.30 to -.49 Moderate negative relationship.
-.50 to -.69 Substantial negative relationship.
-.70 or lower Very strong negative relationship.
Source: Adapted from James A. Davis, Elementary Survey Analysis. Englewood Cliffs, NJ: Prentice-Hall, 1971, 49.
28. 3. Phi coefficient
D.2. Transform the Phi coefficient into Chi-square
1. State the Null hypothesis.
2. Choose the Alpha level and determine p-value.
3. Apply the formula for Phi coefficient and determine Chi-
square value:
4. Compare Chi-square value and p-value. State the
conclusion.
22
N=
30. 4. Point biserial correlation
4.1. Definition & Function
4.2. Formula
4.3. Meaning of point-biserial coefficient
31. 4. Point biserial correlation
4.1. Definition & Function
“When one of the variables in the correlation is nominal, the point
biserial correlation is used to determine the relationship between
the levels of the nominal variable and the continuous variable.”
(Hatch & Farhady, 1982, pp. 204)
E.g. the correlation between each single test item and the total test
score:
- Nominal variable: answers to a single test item
- Continuous variable: total test score
32. 4. Point biserial correlation
4.1. Definition & Function
- Functions:
o To analyze test items
o To investigate the correlation between some language
behaviors for male/female
o To investigate the correlation between any other nominal
variable and test performance
33. 4. Point biserial correlation
4.2. Formula
a. By hand
rpbi =
𝑋 𝑝
−𝑋 𝑞
𝑠
𝑝𝑞
𝑋 𝑝: the mean score on the total test of Ss answering the item right
𝑋 𝑞: the mean score on the total test of Ss answering the item wrong
𝑝: proportion of cases answering the item right
𝑞: proportion of cases answering the item wrong
𝑠:standard deviation of the total sample on the test
34. 4. Point biserial correlation
4.2. Formula
E.g. the correlation between each single test item and total test score
Table 2. Sample Student Data Matrix (Varma, n.d., pp. 4)
35. 4. Point biserial correlation
4.2. Formula
E.g. the correlation between test item 1 and total test score
𝑋 𝑝=
9+8+7+7+7+4
6
=7
𝑋 𝑞=
4+3+2
3
= 3
𝑝 =
6
9
= .67 ; 𝑞 =
3
9
= .33
Mean =
9+8+7+7+7+4+4+3+2
9
= 5.67
𝑠 =
(9−5.67)2+ …+ (2−5.67)2
9−1
= 2.45
Items
Students
4 Total test
scores
Kid A 1 9
Kid B 1 8
Kid C 1 7
Kid D 1 7
Kid E 1 7
Kid F 0 4
Kid G 1 4
Kid H 0 3
Kid I 0 2
rpbi =
7−3
2.45
.67 (.33) = .77 .
36. 4. Point biserial correlation
4.2. Formula
Exercise. the correlation between test item 4 and total test score
Answer:
𝑋 𝑝= 7 ; 𝑋 𝑞= 4
𝑝 = .56 ; 𝑞 = .44
𝑠 = 2.8
rpbi= .53
Items
Students
6 Total test
scores
Kid A 1 9
Kid B 1 8
Kid C 1 7
Kid D 0 7
Kid E 1 7
Kid F 0 4
Kid G 1 4
Kid H 0 3
Kid I 0 2
37. 4. Point biserial correlation
4.3. Meaning of point-biserial coefficient
- A high point-biserial coefficient means that students selecting
more correct (incorrect) responses are students with higher
(lower) total scores
discriminate between low-performing examinees and high-
performing examinees
- Very low or negative point-biserial coefficients computed after
field testing new items can help identify items that are flawed.
38. Reference
BBC. (n.d.). Variation and classification. Retrieved from
http://www.bbc.co.uk/bitesize/ks3/science/organisms_behaviour_health/
variation_classification/revision/3/
Hatch, E. & Farhady, H. (1982). Research design and statistics for applied
linguistics. Rowley: Newburry.
Lund, A. & Lund, M. (n.d.). Retrieved from https://statistics.laerd.com/statistical-
guides/spearmans-rank-order-correlation-statistical-guide.php
39. Reference
Nominal measure of correlation (n.d.). Retrieved from
http://www.harding.edu/sbreezeel/460%20files/statbook/chapter15.pdf
Varma, S. (n.d.). Preliminary item statistics using point-biserial correlation and p-
values. Morgan Hill, CA: Educational Data Systems.
Editor's Notes
Mean: average; standard deviation: the amount by which a measurement is different from standard