2. DEFINITION OF TERMS
• Reliability: consistency in measurement
• Reliability Coefficient: the ratio between true score
variance on a test and the total variance
• Measurement Error: factors associated with the process
of measuring some variable other than what is being
measured
• Random Error: error due to measuring a targeted variable
caused by unpredictable fluctuations and inconsistencies
of other variables in the measurement process
• Systematic Error: error due to measuring a variable
typically constant or proportionate to what is presumed to
be the true value of variable being measured
3. SOURCES OF ERROR
• Test Construction
• Variation among items within a test and between tests
• Test Interpretation
• Error variances occurring test administration can cause
alterations in attention or motivation
• Test Scoring
• Objective tests have well-documented reliability
• Subjective tests can be subject to source of error
variabce
4. CLASSICAL TEST THEORY
• It assumes that each person would have a true score
obtained if there were no errors in measurement
• Errors of measurement are random
• True scores will not change with repeated
applications of the same test
• X (observed true score) = T (true score) + E (error)
5. DOMAIN SAMPLING
THEORY
• Uses a limited number of items to represent a larger
and more complicated construct
• Reliability is the ratio of variance on shorter test and
variance of the long-run true score
• The greater the number of items, the greater the
reliability
6. ITEM RESPONSE THEORY
(LATENT TRAIT THEORY)
• Models the probability that a person with X ability
will be able to perform at a level of Y
• Discrimination: how an item differentiates among
people with higher or lower levels of whatever is
being measured
7. GENERALIZABILITY
THEORY
• Universe score
• Facets: number of items, amount of training raters
had, and purpose of the test; must be similar
• Generalizability Study: how much impact the
different facets have on the test score; how can
scores be generalized in different situations
• Decision Study: usefulness of test in helping the user
to make decisions
8. TEST-RETEST
RELIABILITY
• It is an estimate of reliability obtained by correlating
pairs of scores from the same people on different
administrations
• It is appropriate when the test measures something
that is relatively stable over time
• Coefficient of Stability: when the interval between
the obtained measures was greater than six months.
9. ALTERNATE-FORMS &
PARALLEL-FORMS RELIABILITY
• Parallel-Forms Reliability: the extent to which item
sampling and other errors have affected the test
scores on versions of the same test; means and the
variances of observed test scores are equal.
• Alternate-Forms Reliability: the estimate by which
the different forms of the test have been affected by
item sampling error or others
• Coefficient of Equivalence: the degree of the
relationship between various forms of the test
10. SPLIT-HALF RELIABILITY
• Obtain through the correlation of two pairs of scores from
equivalent halves of a test that was administered once
• Also known as the odd-even reliability
• Spearman-Brown Formula: estimates the correlation
between two-halves if each had been the length of the
whole test. This can be used to estimate the effect of
shortening the items onto the reliability of the test and
determine the number of items needed to attained a
desired level of reliability
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 𝑟 =
2𝑟
1 + 𝑟- r is the Pearson’s r between two halves
11. SPLIT-HALF RELIABILITY
• Cronbach’s Alpha: when two-halves have unequal
variances.
𝛼 =
2 𝜎𝑥
2
− 𝜎 𝑦1
2
𝜎 𝑦2
2
𝜎𝑥
2
- 𝜎 𝑥
2
variance of scores on the whole test
- 𝜎 𝑦1
2 𝜎 𝑦2
2 variance for two separate halves of the test
• Coefficient Omega: used to estimate the extent by
which all items measure the same underlying trait.
12. INTER-ITEM CONSISTENCY
• The degree of correlation among all items on a scale
• Can be used to measure the degree of homogeneity
• Kuder-Richardson Formula: measures the inter-item
consistency for dichotomous items
𝐾𝑅20 =
𝑘
𝑘 − 1
1 −
𝑝𝑞
𝜎2
13. INTER-ITEM CONSISTENCY
• Coefficient Alpha: measures the inter-item
consistency of nondichotomous items; mean of all
possible split-half correlations
𝛼 =
𝑁
𝑁 − 1
𝑆2
− Σ𝑆ⅈ2
𝑆2
• Average Proportional Distance: focuses on the
difference between item scores. A value of <0.20 is an
indicator of excellent internal consistency
14. INTER-SCORER RELIABILITY
• The degree of agreement between scorers with
regard to particular measures
• Fleiss Kappa: the actual agreement proportion of the
potential agreement following correction for chance
agreement
15. STANDARD ERROR OF
MEASUREMENT
• Provides a measure of precision of an observed test
score.
• The higher the test reliability is, the lower the
standard error of measurement
𝑆𝐸𝑀 = 𝜎 1 − 𝑟
16. STANDARD ERROR OF
MEASUREMENT
• Standard error of measurements are used for
constructing confidence intervals around specific
observed scores in the attempt to inform the
probability that the true score lies within the range of
scores
• Z-scores are commonly used
• Reporting:
“Given the assessee obtained a score of ____, there are
two out of three chances that the assessee’s true score
would fall between ____ and ___”
17. IMPROVING RELIABILITY
• Quality of test items
• Adequate sampling of content domains
• Longer assessment
• Develop a scoring plan
• Ensure validity