Presentation at the EUKS conference in Edinburgh, February 23rd 2019. An introduction to how assessment issues in HE may be relevant to language teachers in the future.
Learning-oriented assessment in an era of high-stakes and insecure testing
1. Learning-oriented assessment
in an era of high-stakes and
insecure testing
Mark Carver
mac32@st-andrews.ac.uk
@MQuITE_Ed
@themarkcarver
An audio recording of this presentation can be downloaded here
2. Overview
• Some key principles in assessment design and their impact on
pedagogy
• Predicting a high-stakes and insecure future for testing
• How language teachers may need to respond
• Examples at St Andrews – scalable?
3. Introduction
• Teaching Fellow at University of St Andrews
• MSc TESOL Assessment and Evaluation, Teaching and Researching
• Research Assistant at University of Edinburgh
• @MQuITE_Ed
• Edinburgh Napier University
• Programme evaluation using TESTA
• Lancaster University (PhD)
• Evaluating feedback in ITE
• Teaching experience in UK, Spain, China, Thailand
4. Tests and decision making
• Teachers spend a great deal of time writing tests
• Students spend a great deal of time preparing for tests
• A score is given, and decisions are made
• What are these decisions?
• Who makes them?
5. Uses for tests
• Placement at beginning of language course
• Progress to the next stage/year at school
• Proficiency – Cambridge exams (PET,CET, FCE, CAE,
CPE), IELTS
• School entrance/selection
• University entrance
• Position in the workplace
• Immigration
6. Uses for tests (Bachman and Palmer,
2010)
• Entrance, readiness
• Placement
• Changes in instruction
• Changes in approaches to or strategies of learning
• Achievement/progress (pass/fail)
• Certification
• Selection (e.g., employment, immigration)
• Allocation of resources
7. Assumptions and accountability
• Task is relevant to decision we want to make
• Language produced is not only relevant but can be scored
• Once scored, there is a direct relationship between score and
test taker’s ability
• Direct inference can be made
• The score can be used as an sufficient or necessary condition
• Bachman and Palmer’s (2010) ‘Assessment Use Argument’
8. Problems
• The need for more and more tests
• Grade inflation
• Criteria- vs norm-referencing
• Commercial competition
• Ease of contract cheating
• We assess time/work rather than ability – can be
inequitable for some types of students
9. High-stakes testing and the Diploma
Disease (Stobart, 2008)
• Cheating – moral issues in high stakes tests
• Scores take on a commercial value
• Interest for schools to keep scores high
• Mechanism to run education systems (funding)
• Manipulate society – education becomes the solution to
everything!
• Role of testing in meritocracy
• Fulfills a social and policy role
-> fairness is of major importance
10. How high-stakes are our assessments?
• A 200-word essay used to assess academic writing ability
• A 10-minute conversation to assess speaking ability
• A three-year programme to assess scientific thinking
• Tests such as IELTS must devote vast resources to demonstrating validity
and reliability (100% summative), cumulative coursework can focus more
on learning (~50% formative)
• Competition can create grade inflation and a plurality of tests: PTE using
LSA and re-scoring is potentially a game changer
• Exams as a ‘gold standard’ (e.g. AACSB)
11. A future beyond reliability and fairness
• Usefulness = reliability, construct validity, authenticity,
interactiveness, impact and practicality (Bachman & Palmer, 1996)
• Practicality, reliability, validity, authenticity, washback (Brown, 2003)
• Practicality – easy to construct, administer, grade, interpret, and
cost-effective (place, invigilation, marking)
• Reliability – a similar spread of results will be achieved if the test is
repeated
• Authenticity – the test reflects ‘real’ life, whether communicative or
academic
• Washback – the test promotes teaching and further learning
12. New priorities?
• Validity
• Fairness
• Veracity
• Transparency
• Real world
• Feedback to learners (Race, 2014)
• Scalable (Carless et al., 2017)
• Partnership (NUS, 2015)
13. Let us abandon the goal of manipulating students into doing what
the faculty desires and settle for something more modest. We
can take as a reasonable proximate goal that we at least do
nothing (or as little as possible) to interfere with whatever
tendency students might have to engage in academic activities . .
. instead of trying to get students to do what we want, we look
only for ways of not encouraging them to do what we do not
want. We ask how a college might be organized so as not to
provoke or coerce students into forms of activity that interfere
with what we might want to achieve. (Becker, Geer, and Hughes
1968, p. 138)
14. Discussion
• My prediction is a future where testing is even more important
overall, but we will have less confidence in veracity or
‘whodunnitness’. Overengineered responses to this risk taking
assessment too far away from learning. We will need lots of
low-stakes testing as part of a high-stakes system to satisfy
these needs.
• If I’m right, what might the language teacher of tomorrow need
in their assessment toolkit?
15. Assessment and active learning
• Eric Mazur (Jang et al., 2017)
• Students complete an individual assessment (50%), no feedback.
Very challenging Qs, mean score should be about 50%
• Students then assigned to groups to answer the same questions
(50%), get feedback and can then re-attempt incorrect answers
with diminishing marks
• So far, no clear answer on splitting marks, giving small
percentages, gamification, or end-loading
• Marks can focus attention, but can also result in an arms race
• My advice is to simplify and de-risk: there should be better ways to
convince students to do their work
16. Some optimism: longer-term goals
• Test designers and test takers can often be in opposition as
surface approaches to learning and increased assessment literacy
change how tests are taken
• Tests must continually look at the kind of learning and dispositions
they produce as well as the use made of the test score. The
Stanford-Binet IQ Test is a classic example.
• Is assessment sustainable (Boud, 2000)?
• Is assessment learning-oriented (Carless, 2007)?
• Do you learn something from the act of preparing for and taking the
test?
• Do you learn more than ‘stuff’?
• Does your evaluative judgement improve?
17.
18. An example at St Andrews
• LOA or Assessment as Learning, e.g. 5 x 10% tests, drop lowest
mark for 40% of grade
• Increased use of exemplars
• Exemplars and critique of our criteria
• 20-point scale
• Low-stakes opportunities to develop assessment literacy
• Spiralling of assessment types (Bloxham and Boyd, 2007)
• Anonymous marking working?
• Portfolio approaches
• IELTS ‘necessary but not sufficient’: Interviews for all applicants
• Slowing programme pace?
19. Closing discussion
• At what point does low-stakes LOA or AAL just become good
teaching?
• Do you favour strategies such as gamification and assessment
mapping or a ‘stripped back’ approach?
• How might language schools and larger programmes deal with
the problem of scale?
20. Recommended reading
• Bachman, L. F. and Palmer, A. S. (2010). Language assessment in
practice. Oxford: Oxford University Press.
• Boud, D. (2000) ‘Sustainable assessment: rethinking assessment for
the learning society’, Studies in continuing education, 22(2), pp.
151–167.
• Carless, D. (2007) ‘Learning‐oriented assessment: conceptual bases
and practical implications’, Innovations in Education and Teaching
International, 44(1), pp. 57–66.
• Sadler, D. R. (2010) ‘Beyond feedback: Developing student
capability in complex appraisal’, Assessment and Evaluation in
Higher Education. Routledge, 35(5), pp. 535–550.
• Stobart, G. (2008). Testing Times: The uses and abuses of
assessment. London: Routledge.