Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy

1. Phạm Phúc Khánh Minh
2. Nguyễn Trần Hoài Phương
3. Nguyễn Ngọc Phương Thành
4. Võ Thị Thanh Thư
5. Đỗ Thị Bạch Vân
6. Ngô Thảo Vy TESOL 2014B
1. The test production process
2. Approaches to language testing
3. Techniques of language testing: Item types
4. Bloom’s taxonomy and testing

Item analysis
Classical Test
Theory
Item-Response
Theory
One-Parameter
(Rasch Model)
Two-Parameter Three-Parameter
1. The test production process

1.1. Classical Test Theory (CTT) vs
Item-Response Theory (IRT)
CTT
• Measured at test level
• Only apply to those students taking that
test
IRT
• Measured at item level
• Provide sample-free measurement

1.2. Advantages theory offered by
Latent Trait Theory
Sample-Free Item Calibration
Classical Test Theory
•The estimated item
difficulty varies with
the average ability of
the particular sample of
examinees observed
•-> Item analysis is
sample-bound
Item-Response Theory
•An item difficulty scale
is independent of ability
differences of ability
differences of any
particular sample of
examinees
•-> Item analysis is
sample-free

Latent Trait Theory
Test-Free Person Measurement
• Ability measurement
is dependent on the
unique clustering of
items
• Possible to compare
abilities of persons using
different tests

Latent Trait Theory
Multiple Reliability Estimation
•Ability estimation
varies in reliability.
One global estimate of
reliability should not
be applied in
evaluating the
accuracy of scores for
every individual
examined
•Reliability estimation
goes beyond a global
estimate for a given
test, to a confidence
estimate associated
with every possible
person and item score
on that test

Latent Trait Theory
Identification of Guessers and Other
Deviant Respondents
Classical Test
Theory
• Impossible to
identify persons’
misfit
Item-Response
Theory
• Possible to identify
persons’ misfit

Latent Trait Theory
Reconciliation of Norm-Referenced and
Criterion-Referenced Testing
•Unable to reconcile
Norm-Referenced and
Criterion-Referenced
Testing to
measurement
•Able to reconcile
Norm-Referenced and
Criterion-Referenced
Testing to
measurement

Latent Trait Theory
Test Equating Facility
•Equated tests require
all test forms to be
equated be
administered to the
same large sample of
•-> time-consuming
•No need to administer
all forms of tests to the
same large sample of
examinees

Latent Trait Theory
Test Tailoring Facility
The tailor test will provide much greater decision
accuracy than the standardized test. Fewer students will
be wrongly admitted to or wrongly rejected from
university or intensive English study.

Latent Trait Theory
Item Banking Facility
Items calibrated -> stored in an item bank
according to a common metric of difficulty
Permit the construction of tests of known
reliability and validity based on appropriate
selection of item subsets from the bank without
further need for trial in the field

Latent Trait Theory
The Study of Item and Test Bias
•Uncommon to quantify
the amount and
direction of bias for any
given item or person
•Able to quantify the
amount and direction of
bias for any given item
or person
•=> Test bias is
neutralized by removal
or inclusion of biased
items in the opposite
direction

1.2 Advantages theory offered by Latent Trait Theory
Elimination of Boundary Effects in Program
Evaluation
• The problem of boundary
effects
• The person gets all items correct or all items
incorrect => that person’s ability is not
estimated => search for items of greater or
lesser difficulty => ability estimation occurs
• The item is missed by all persons or is
gotten correctly by all persons => that
item’s difficulty is not estimated => search for
persons of greater or lesser ability until at
least one person passes and one person fails
each item => calibration of item difficulty
• Sample size, dispersion and central tendency
are transformed to articulate to the same
interval scale
• => Boundary effects are removed

1.3 Competing Latent Trait Models
The Rasch One-Parameter Model is preferred
by teachers and language testers

Sample size constraints:
- The Rasch Model: 100 – 200 persons
- Two-Parameter Model: 200 – 400 persons
- Three-Parameter Model:
1,000 – 2,000 persons

Introduction to the Rasch, One-
Parameter Model
The Rasch Model is probabilistic in nature: the persons
and items are not only graded for ability and difficulty,
but are judged according to the probability of their
response patterns given the observed person ability and
item difficulty.

Computation of Item Difficulty and
Person Ability
By computer: BICAL (Mead, Wright, and Bell, 1979)
BILOG II (Mislevy and Bock, 1984)
By hand: PROX (Wright and Stone, 1979) – 5 steps
Step 1: Edit the Binary Response Matrix
Every person or item for which all responses are correct or all
responses are incorrect is eliminated
Step 2: Calculate Initial Item Difficulty Calibrations
Find the logit incorrect value for each possible number correct
and set the mean of the vector of logic difficulty values at zero
Step 3: Calculate the Initial Person Measures
Use logit correct values instead of logit incorrect values
Step 4: Calculate the Expansion Factors

Computation of Item Difficulty and
Person Ability
Step 5: Calculate the Standard Errors Associated with
These Estimates
The standard error for each of the final item difficulty
calibrations
The standard error for each of the final personality
measures

2. Approaches to language testing
The essay-
translation
approach
The
structuralist
approach
The
Integrative
approach
The
communicative
approach

2.1 The essay-translation approach
 The pre-scientific stage of language testing
 Require no special skill or expertise in testing
 Tests: + Essay writing, translation & grammatical
analysis
+ A heavy literature and cultural bias

2.2 The structuralist approach
 The systematic acquisition of a set of habits:
+ Structural linguistics
+ Separate elements of the target language (phonology,
vocabulary & grammar)
TESTS
Words and sentences are completely
divorced from any context
Listening, speaking, reading and writing
skills are separated from one another

2.3 The Integrative approach
o Concerned with meaning and the total
communicative effect of discourse
o Assess learners’ ability to use two or more skills
simultaneously
o Types of integrative tests:
+ Doze testing and dictation
+ Oral interview and composition writing
+ Translation  unreliable

2.3.1 DOZE TESTING
The Gestalt
theory of
“closure”
Measure the reader’s ability to
decode “interrupted” messages
by making the most acceptable
substitutions
The more blanks contained
in the text, the more reliable
the doze test will prove

Scoring
Acceptable answer
Correct answer
Misspellings should not be penalised
Grammatical errors should be penalised
The subject in doze tests should be neutral in content and
language variety used
Provide a lead-in
In a doze test:

Doze testing:
Good indicator of general linguistic ability
Require linguistic knowledge, textual
knowledge, and knowledge of the world
Used in achievement, proficiency, classroom
placement tests and diagnostic tests

2.3.2 DICTATION
• Solely measure Ss’ listening comprehension
skills
Previously
• Include auditory discrimination, the auditory
memory span, spelling, the recognition of
sound segments, overall textual comprehension
Recently

CHARACTERISTICS
oNo reliable way of assessing the relative importance of the different
abilities required
oTend to measure low-order language skills rather than high-order skills
oFocus too much on individual sounds rather than on the meaning of the
text  impair memory span but not retain everything Ss hear

TIPS:
 Read through the whole dictation passage first
 Dictate (once or twice) in meaningful units of sufficient length
rather than reading out word by word
 Read the whole passage once more at slightly lower than normal
speed

2.4 The communicative approach
 Primarily focus on how language is used in communication
 Tasks are as close as possible to those facing the Ss in real life
 Judge the effectiveness of the communication rather than formal
linguistic accuracy
 Emphasize on language “use” rather than language “usage”
How people use
language for
different purposes
The formal patterns
of language
Tests of a
communicative nature

Divisibility
hypothesis
Measure different
language skills
Obtain different profiles of a
learner’s performance
Test
score


 NS score less than NNS
 The assessment of language skills in isolation
may have only a very limited relevance to real
life
 Communicative tests must of necessity reflect
the culture of a particular country
 Communicative tests should be based on
precise and detailed specifications of the need
of learners
 Qualitative judgements are superior to
quantitative assessments

3. ITEM TYPES
ITEMTYPES
Selection items
involve the candidate in making a choice of
response between various options offered.
Candidate-supplied items
demand that the candidate supplies the
response, e.g. short answer items, open cloze
items.

3.1 SELECTION ITEMS
Advantages of selection items:
 familiar to nearly all candidates in all places
independent of writing ability
 easy and quick to mark
 capable of being objectively scored
 economical of the candidate's time, so that many can be
attempted in a short period and a range of objectives
covered, adding to the reliability of the test.

Disadvantages of selection items:
 tests of recognition rather than production
 limited in the range of what they can test
 incapable of letting a candidate express a wide range
of abilities
 dependent, in many cases, on reading ability
 affected by guesswork
 very difficult and time consuming to write
successfully
 capable of leading to poor classroom practice, if
teaching focuses too intensively on preparation for
tackling this sort of test item.
3.1 SELECTION ITEMS

3.1.1. Discrete point multiple choice item
3.1.2. Text-based multiple choice item
3.1 SELECTION ITEMS

3.1.3. True / false item
 test takers have to make a choice as to the
truth or otherwise of a statement, normally in
relation to a reading or listening text
3.1 SELECTION ITEMS

3.1.4. Gap-filling (cloze passage) with
multiple choice options
 words are deleted from a text, creating
gaps which the candidate has to fill, normally
with either one or a two words.
3.1 SELECTION ITEMS

3.1.5. Gap-filling with selection from bank
consists of a text with gaps accompanied by
a 'bank' containing all the correct words to
insert in the text, with the addition of
several which will not be used.
3.1 SELECTION ITEMS

3.1.6. Gap-filling at paragraph level
consist of a text with six paragraph-length
gaps. A choice of seven paragraphs is given
from which to fill the gaps.
3.1.7. Matching
 elements from two separate lists of sets of
options have to be brought together.
3.1 SELECTION ITEMS

3.1.8. Multiple matching
 a number of questions or sentence completion
items are set, which are generally based on a
reading text. The responses are provided in the
form of a bank of words or phrases, each of
which can be used an unlimited number of times.
3.1 SELECTION ITEMS

3.1.9. Extra word error detection
 In this type of task there is one extra,
incorrect, word in most of the lines of a text.
3.1 SELECTION ITEMS

Advantages of candidate – supplied items:
 are easier to write
 allow for a wider sample of content
minimize the effect of guessing
 allow for creativity in language use
measure higher as well as lower order skills
 have a more positive effect on classroom practice
 can provide a similar degree of marking objectivity
as selection items
3.2 Candidate-supplied items

Disadvantages of candidate – supplied items:
There are often acceptable alternative responses
rather than only one unambiguously correct
response.
 time consuming and difficult to mark, often
calling for examiner marking rather than clerical
or computerized marking.

3.2.1. Short answer item:
consists of a question which can be answered
in one word or a short phrase. The exact limits
on the length of the answer should be
specified

3.2.2. Sentence completion: In this kind of
item part of a sentence is provided, and the
candidate has to use information derived from
a text to complete it.

3.2.3. Open gap-filling (cloze): In an open
cloze, the gaps are selected by the item writer,
who focuses on the particular structures to be
tested. The candidate's task is to supply the
word which fills each gap in the text.

3.2.4. Transformation: In this type of item,
the candidate is given a sentence, followed by
the opening words of another sentence which
give the same information, but expressed
through a different grammatical structure.

3.2.5. Word formation: In this type of item
one word is deleted from a sentence, and a
related form of the word is given to the
candidate as a prompt.

3.2.6. Transformation cloze:
consists of a text with a word missing in
each line, and a different grammatical form
of the word required supplied.
the candidate has both to find the location
of the missing word and supply it in its
correct form.

3.2.7. Note expansion
In this item type the lexical components of
each sentence are supplied in a reduced form
which resembles notes.
The candidate's task is to supply the correct
grammatical form, including changes in word
order and the addition of such elements as
prepositions, articles and auxiliary verbs.

3.2.7. Note expansion

3.2.8. Error correction / proof reading :
 consists of a text in which a word appears in
an incorrect form in each numbered line. The
candidate has first to identify the incorrect
word, and then write it in its correct form at the
end of the line.

3.2.8. Error correction / proof reading

3.2.9. Information transfer: Tasks described in
this way always involve taking information
given in a certain form and presenting it in a
different form.

3.3. NON-ITEM-BASED TASK TYPES
3.3.1. Writing: extended writing questions
Extended writing can be tested in a number of
ways which vary in the degree of control
exercised by the tester over the candidate's
response.

Writing tasks with detailed input

Writing tasks with titles only

3.3.2. Speaking:
Presentation

Use of picture prompts:

Written prompts:

Information gap tasks
3.3.NON-ITEM-BASED TASK TYPES

4. Bloom’s taxonomy and testing
Bloom’s
taxonomy
Definition
Old version vs. New version
6 levels of thinking

4.1. Definition
BLOOM’S
TAXONOMY
An arrangement
of ideas or a way
to group things
together
Name of the
creator
Bloom’s Taxonomy is a type of
classification of the different
objectives that educators might set
for students.

The development of Bloom’s
taxonomy
1948:
Benjamin Bloom’
s study on
classroom
activities and
goals
1956:
The
publication
of original
Bloom’s
Taxonomy
1995:
The
revision of
original
Bloom’s
Taxonomy
2001:
The final
revision of
Bloom’s
Taxonomy

Original Bloom’s Taxonomy
Old Bloom’s Taxonomy

4.2. Old vs. New Bloom’s
Taxonomy

What’s the Difference?
Original Bloom’s Taxonomy
• Terminology: Used nouns to
describe the levels of
thinking.
• Structure: One dimensional
using the Cognitive Process.
• Emphasis was originally for
educators and psychologists.
Bloom’s taxonomy was
used by many other
audiences.
Revised Bloom’s Taxonomy
• Terminology: Uses verbs to
describe the levels of thinking.
• Structure: Two dimensional
using the Knowledge
Dimension and how it interacts
with the Cognitive Process.
See next slide for an
interactive grid.
• Emphasis is placed upon its
use as a more authentic tool for
curriculum planning,
instructional delivery and
assessment.

4.3. The levels of thinking
There are six levels of learning
according to Dr. Bloom:
1. Knowledge
2. Comprehension
3. Application
4. Analysis
5. Synthesis
6. Evaluation

The levels of thinking
Knowledge or Remembering
• Observation and recall of information
• Knowledge of dates, events, places, major
ideas, etc.
• Mastery of subject matter
• Key words: list, define, tell, describe,
identify, show, label, collect, examine,
tabulate, quote, name, who, when, where,
etc.

Knowledge/Remembering –
Practice
• Write a list of vegetables.

Comprehension or Understanding
• Understanding information
• Grasp the meaning
• Translate knowledge into new context
• Interpret facts, compare, contrast
• Order, group, infer causes
• Predict consequences
• Key words: summarize, describe, interpret, contrast, predict,
associate, distinguish, estimate, differentiate, discuss, extend

Comprehension/
Understanding – Practice
• Retell the story of the “Sleeping
Beauty” in your own words.

Application or Applying
• Use information
• Use methods, concepts, theories in new
situations
• Solve problems using required skills or
knowledge
• Key words: apply, demonstrate, calculate,
complete, illustrate, show, solve,
examine, modify, relate, change, classify,
experiment, discover

Application/Applying –
Practice
• Make an imaginary story and tell it.

Analysis or Analyzing
• Seeing patterns
• Organization of parts
• Recognition of hidden meanings
• Identification of components
• Key words: analyze, separate, order,
explain, connect, classify, arrange, divide,
compare, select, explain, infer

Analysis/ Analyzing –
Practice
• Make a family tree to show
relationships.

Synthesis or Creating
• Use old ideas to create new ones
• Generalize from given facts
• Relate knowledge from several areas
• Predict, draw conclusions
• Key words: combine, integrate, modify,
rearrange, substitute, plan, create, design, invent,
what if?, compose, formulate, prepare, generalize,
rewrite

Synthesis/Creating –
Practice
• Design a magazine cover that would
appeal to the students in your class.

Evaluation or Evaluating
• Compare and discriminate between ideas
• Assess value of theories, presentations
• Make choices based on reasoned argument
• Verify value of evidence
• Recognize subjectivity
• Key words: assess, decide, rank, grade, test, measure,
recommend, convince, select, judge, explain,
discriminate, support, conclude, compare, summarize

Evaluation/Evaluating –
Practice
• Make a booklet about 5 rules for the
country that you see as important.
Convince others.

Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy

Similar to Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy (20)

More from Phạm Phúc Khánh Minh

More from Phạm Phúc Khánh Minh (12)

Recently uploaded

Recently uploaded (20)

Test production process - Approaches to language testing - Techniques of language testing - Bloom's taxonomy

Editor's Notes