The test production process
+ Item analysis: Classical Test Theory (CTT) vs Item-Response Theory (IRT)
Approaches to language testing
+ Essay-translation
+ Structuralist
+ Integrative
+ Communicative
Techniques of language testing: Item types
(1) Multiple choice and other selection types
(2) Candidate supplied response item types
(3) Non-item-based task types
Bloom’s taxonomy and testing
Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy
1. 1. Phạm Phúc Khánh Minh
2. Nguyễn Trần Hoài Phương
3. Nguyễn Ngọc Phương Thành
4. Võ Thị Thanh Thư
5. Đỗ Thị Bạch Vân
6. Ngô Thảo Vy TESOL 2014B
1. The test production process
2. Approaches to language testing
3. Techniques of language testing: Item types
4. Bloom’s taxonomy and testing
3. 1.1. Classical Test Theory (CTT) vs
Item-Response Theory (IRT)
CTT
• Measured at test level
• Only apply to those students taking that
test
IRT
• Measured at item level
• Provide sample-free measurement
4. 1.2. Advantages theory offered by
Latent Trait Theory
Sample-Free Item Calibration
Classical Test Theory
•The estimated item
difficulty varies with
the average ability of
the particular sample of
examinees observed
•-> Item analysis is
sample-bound
Item-Response Theory
•An item difficulty scale
is independent of ability
differences of ability
differences of any
particular sample of
examinees
•-> Item analysis is
sample-free
5. 1.2. Advantages theory offered by
Latent Trait Theory
Test-Free Person Measurement
Classical Test Theory
• Ability measurement
is dependent on the
unique clustering of
items
Item-Response Theory
• Possible to compare
abilities of persons using
different tests
6. 1.2. Advantages theory offered by
Latent Trait Theory
Multiple Reliability Estimation
Classical Test Theory
•Ability estimation
varies in reliability.
One global estimate of
reliability should not
be applied in
evaluating the
accuracy of scores for
every individual
examined
Item-Response Theory
•Reliability estimation
goes beyond a global
estimate for a given
test, to a confidence
estimate associated
with every possible
person and item score
on that test
7. 1.2. Advantages theory offered by
Latent Trait Theory
Identification of Guessers and Other
Deviant Respondents
Classical Test
Theory
• Impossible to
identify persons’
misfit
Item-Response
Theory
• Possible to identify
persons’ misfit
8. 1.2. Advantages theory offered by
Latent Trait Theory
Reconciliation of Norm-Referenced and
Criterion-Referenced Testing
Classical Test Theory
•Unable to reconcile
Norm-Referenced and
Criterion-Referenced
Testing to
measurement
Item-Response Theory
•Able to reconcile
Norm-Referenced and
Criterion-Referenced
Testing to
measurement
9. 1.2. Advantages theory offered by
Latent Trait Theory
Test Equating Facility
Classical Test Theory
•Equated tests require
all test forms to be
equated be
administered to the
same large sample of
•-> time-consuming
Item-Response Theory
•No need to administer
all forms of tests to the
same large sample of
examinees
10. 1.2. Advantages theory offered by
Latent Trait Theory
Test Tailoring Facility
The tailor test will provide much greater decision
accuracy than the standardized test. Fewer students will
be wrongly admitted to or wrongly rejected from
university or intensive English study.
11. 1.2. Advantages theory offered by
Latent Trait Theory
Item Banking Facility
Items calibrated -> stored in an item bank
according to a common metric of difficulty
Permit the construction of tests of known
reliability and validity based on appropriate
selection of item subsets from the bank without
further need for trial in the field
12. 1.2. Advantages theory offered by
Latent Trait Theory
The Study of Item and Test Bias
Classical Test Theory
•Uncommon to quantify
the amount and
direction of bias for any
given item or person
Item-Response Theory
•Able to quantify the
amount and direction of
bias for any given item
or person
•=> Test bias is
neutralized by removal
or inclusion of biased
items in the opposite
direction
13. 1.2 Advantages theory offered by Latent Trait Theory
Elimination of Boundary Effects in Program
Evaluation
Classical Test Theory
• The problem of boundary
effects
Item-Response Theory
• The person gets all items correct or all items
incorrect => that person’s ability is not
estimated => search for items of greater or
lesser difficulty => ability estimation occurs
• The item is missed by all persons or is
gotten correctly by all persons => that
item’s difficulty is not estimated => search for
persons of greater or lesser ability until at
least one person passes and one person fails
each item => calibration of item difficulty
• Sample size, dispersion and central tendency
are transformed to articulate to the same
interval scale
• => Boundary effects are removed
14. 1.3 Competing Latent Trait Models
The Rasch One-Parameter Model is preferred
by teachers and language testers
16. 1.3 Competing Latent Trait Models
Introduction to the Rasch, One-
Parameter Model
The Rasch Model is probabilistic in nature: the persons
and items are not only graded for ability and difficulty,
but are judged according to the probability of their
response patterns given the observed person ability and
item difficulty.
17. 1.3 Competing Latent Trait Models
Computation of Item Difficulty and
Person Ability
By computer: BICAL (Mead, Wright, and Bell, 1979)
BILOG II (Mislevy and Bock, 1984)
By hand: PROX (Wright and Stone, 1979) – 5 steps
Step 1: Edit the Binary Response Matrix
Every person or item for which all responses are correct or all
responses are incorrect is eliminated
Step 2: Calculate Initial Item Difficulty Calibrations
Find the logit incorrect value for each possible number correct
and set the mean of the vector of logic difficulty values at zero
Step 3: Calculate the Initial Person Measures
Use logit correct values instead of logit incorrect values
Step 4: Calculate the Expansion Factors
18. 1.3 Competing Latent Trait Models
Computation of Item Difficulty and
Person Ability
Step 5: Calculate the Standard Errors Associated with
These Estimates
The standard error for each of the final item difficulty
calibrations
The standard error for each of the final personality
measures
19. 2. Approaches to language testing
The essay-
translation
approach
The
structuralist
approach
The
Integrative
approach
The
communicative
approach
20. 2.1 The essay-translation approach
The pre-scientific stage of language testing
Require no special skill or expertise in testing
Tests: + Essay writing, translation & grammatical
analysis
+ A heavy literature and cultural bias
21. 2.2 The structuralist approach
The systematic acquisition of a set of habits:
+ Structural linguistics
+ Separate elements of the target language (phonology,
vocabulary & grammar)
TESTS
Words and sentences are completely
divorced from any context
Listening, speaking, reading and writing
skills are separated from one another
22. 2.3 The Integrative approach
o Concerned with meaning and the total
communicative effect of discourse
o Assess learners’ ability to use two or more skills
simultaneously
o Types of integrative tests:
+ Doze testing and dictation
+ Oral interview and composition writing
+ Translation unreliable
23. 2.3.1 DOZE TESTING
The Gestalt
theory of
“closure”
Measure the reader’s ability to
decode “interrupted” messages
by making the most acceptable
substitutions
The more blanks contained
in the text, the more reliable
the doze test will prove
24. Scoring
Acceptable answer
Correct answer
Misspellings should not be penalised
Grammatical errors should be penalised
The subject in doze tests should be neutral in content and
language variety used
Provide a lead-in
In a doze test:
25. Doze testing:
Good indicator of general linguistic ability
Require linguistic knowledge, textual
knowledge, and knowledge of the world
Used in achievement, proficiency, classroom
placement tests and diagnostic tests
26. 2.3.2 DICTATION
• Solely measure Ss’ listening comprehension
skills
Previously
• Include auditory discrimination, the auditory
memory span, spelling, the recognition of
sound segments, overall textual comprehension
Recently
27. CHARACTERISTICS
oNo reliable way of assessing the relative importance of the different
abilities required
oTend to measure low-order language skills rather than high-order skills
oFocus too much on individual sounds rather than on the meaning of the
text impair memory span but not retain everything Ss hear
28. TIPS:
Read through the whole dictation passage first
Dictate (once or twice) in meaningful units of sufficient length
rather than reading out word by word
Read the whole passage once more at slightly lower than normal
speed
29. 2.4 The communicative approach
Primarily focus on how language is used in communication
Tasks are as close as possible to those facing the Ss in real life
Judge the effectiveness of the communication rather than formal
linguistic accuracy
Emphasize on language “use” rather than language “usage”
How people use
language for
different purposes
The formal patterns
of language
Tests of a
communicative nature
31.
NS score less than NNS
The assessment of language skills in isolation
may have only a very limited relevance to real
life
Communicative tests must of necessity reflect
the culture of a particular country
Communicative tests should be based on
precise and detailed specifications of the need
of learners
Qualitative judgements are superior to
quantitative assessments
32. 3. ITEM TYPES
ITEMTYPES
Selection items
involve the candidate in making a choice of
response between various options offered.
Candidate-supplied items
demand that the candidate supplies the
response, e.g. short answer items, open cloze
items.
33. 3.1 SELECTION ITEMS
Advantages of selection items:
familiar to nearly all candidates in all places
independent of writing ability
easy and quick to mark
capable of being objectively scored
economical of the candidate's time, so that many can be
attempted in a short period and a range of objectives
covered, adding to the reliability of the test.
34. Disadvantages of selection items:
tests of recognition rather than production
limited in the range of what they can test
incapable of letting a candidate express a wide range
of abilities
dependent, in many cases, on reading ability
affected by guesswork
very difficult and time consuming to write
successfully
capable of leading to poor classroom practice, if
teaching focuses too intensively on preparation for
tackling this sort of test item.
3.1 SELECTION ITEMS
36. 3.1.3. True / false item
test takers have to make a choice as to the
truth or otherwise of a statement, normally in
relation to a reading or listening text
3.1 SELECTION ITEMS
37. 3.1.4. Gap-filling (cloze passage) with
multiple choice options
words are deleted from a text, creating
gaps which the candidate has to fill, normally
with either one or a two words.
3.1 SELECTION ITEMS
38. 3.1.5. Gap-filling with selection from bank
consists of a text with gaps accompanied by
a 'bank' containing all the correct words to
insert in the text, with the addition of
several which will not be used.
3.1 SELECTION ITEMS
39. 3.1.6. Gap-filling at paragraph level
consist of a text with six paragraph-length
gaps. A choice of seven paragraphs is given
from which to fill the gaps.
3.1.7. Matching
elements from two separate lists of sets of
options have to be brought together.
3.1 SELECTION ITEMS
41. 3.1.8. Multiple matching
a number of questions or sentence completion
items are set, which are generally based on a
reading text. The responses are provided in the
form of a bank of words or phrases, each of
which can be used an unlimited number of times.
3.1 SELECTION ITEMS
42. 3.1.9. Extra word error detection
In this type of task there is one extra,
incorrect, word in most of the lines of a text.
3.1 SELECTION ITEMS
43. Advantages of candidate – supplied items:
are easier to write
allow for a wider sample of content
minimize the effect of guessing
allow for creativity in language use
measure higher as well as lower order skills
have a more positive effect on classroom practice
can provide a similar degree of marking objectivity
as selection items
3.2 Candidate-supplied items
44. Disadvantages of candidate – supplied items:
There are often acceptable alternative responses
rather than only one unambiguously correct
response.
time consuming and difficult to mark, often
calling for examiner marking rather than clerical
or computerized marking.
3.2 Candidate-supplied items
45. 3.2.1. Short answer item:
consists of a question which can be answered
in one word or a short phrase. The exact limits
on the length of the answer should be
specified
3.2 Candidate-supplied items
46. 3.2.2. Sentence completion: In this kind of
item part of a sentence is provided, and the
candidate has to use information derived from
a text to complete it.
3.2 Candidate-supplied items
47. 3.2.3. Open gap-filling (cloze): In an open
cloze, the gaps are selected by the item writer,
who focuses on the particular structures to be
tested. The candidate's task is to supply the
word which fills each gap in the text.
3.2 Candidate-supplied items
48. 3.2.4. Transformation: In this type of item,
the candidate is given a sentence, followed by
the opening words of another sentence which
give the same information, but expressed
through a different grammatical structure.
49. 3.2.5. Word formation: In this type of item
one word is deleted from a sentence, and a
related form of the word is given to the
candidate as a prompt.
3.2 Candidate-supplied items
50. 3.2.6. Transformation cloze:
consists of a text with a word missing in
each line, and a different grammatical form
of the word required supplied.
the candidate has both to find the location
of the missing word and supply it in its
correct form.
3.2 Candidate-supplied items
51. 3.2.7. Note expansion
In this item type the lexical components of
each sentence are supplied in a reduced form
which resembles notes.
The candidate's task is to supply the correct
grammatical form, including changes in word
order and the addition of such elements as
prepositions, articles and auxiliary verbs.
3.2 Candidate-supplied items
53. 3.2.8. Error correction / proof reading :
consists of a text in which a word appears in
an incorrect form in each numbered line. The
candidate has first to identify the incorrect
word, and then write it in its correct form at the
end of the line.
3.2 Candidate-supplied items
55. 3.2.9. Information transfer: Tasks described in
this way always involve taking information
given in a certain form and presenting it in a
different form.
3.2 Candidate-supplied items
56. 3.3. NON-ITEM-BASED TASK TYPES
3.3.1. Writing: extended writing questions
Extended writing can be tested in a number of
ways which vary in the degree of control
exercised by the tester over the candidate's
response.
63. 4. Bloom’s taxonomy and testing
Bloom’s
taxonomy
Definition
Old version vs. New version
6 levels of thinking
64. 4.1. Definition
BLOOM’S
TAXONOMY
An arrangement
of ideas or a way
to group things
together
Name of the
creator
Bloom’s Taxonomy is a type of
classification of the different
objectives that educators might set
for students.
65. The development of Bloom’s
taxonomy
1948:
Benjamin Bloom’
s study on
classroom
activities and
goals
1956:
The
publication
of original
Bloom’s
Taxonomy
1995:
The
revision of
original
Bloom’s
Taxonomy
2001:
The final
revision of
Bloom’s
Taxonomy
68. What’s the Difference?
Original Bloom’s Taxonomy
• Terminology: Used nouns to
describe the levels of
thinking.
• Structure: One dimensional
using the Cognitive Process.
• Emphasis was originally for
educators and psychologists.
Bloom’s taxonomy was
used by many other
audiences.
Revised Bloom’s Taxonomy
• Terminology: Uses verbs to
describe the levels of thinking.
• Structure: Two dimensional
using the Knowledge
Dimension and how it interacts
with the Cognitive Process.
See next slide for an
interactive grid.
• Emphasis is placed upon its
use as a more authentic tool for
curriculum planning,
instructional delivery and
assessment.
69. 4.3. The levels of thinking
There are six levels of learning
according to Dr. Bloom:
1. Knowledge
2. Comprehension
3. Application
4. Analysis
5. Synthesis
6. Evaluation
70. The levels of thinking
Knowledge or Remembering
• Observation and recall of information
• Knowledge of dates, events, places, major
ideas, etc.
• Mastery of subject matter
• Key words: list, define, tell, describe,
identify, show, label, collect, examine,
tabulate, quote, name, who, when, where,
etc.
74. The levels of thinking
Application or Applying
• Use information
• Use methods, concepts, theories in new
situations
• Solve problems using required skills or
knowledge
• Key words: apply, demonstrate, calculate,
complete, illustrate, show, solve,
examine, modify, relate, change, classify,
experiment, discover
78. The levels of thinking
Synthesis or Creating
• Use old ideas to create new ones
• Generalize from given facts
• Relate knowledge from several areas
• Predict, draw conclusions
• Key words: combine, integrate, modify,
rearrange, substitute, plan, create, design, invent,
what if?, compose, formulate, prepare, generalize,
rewrite
80. The levels of thinking
Evaluation or Evaluating
• Compare and discriminate between ideas
• Assess value of theories, presentations
• Make choices based on reasoned argument
• Verify value of evidence
• Recognize subjectivity
• Key words: assess, decide, rank, grade, test, measure,
recommend, convince, select, judge, explain,
discriminate, support, conclude, compare, summarize
According to the original Bloom’s Taxonomy, the lowest order of thinking is knowledge (remembering something) and comprehension (knowing what something use). These levels were used as building blocks to help teachers scaffold their lessons and build students up to the top level of thinking.
Notice the terminology changes in the comparison above.