• In today’s language classrooms, the termassessment usually evokes images of an end-of-course paper-pencil test designed to tell bothteachers and students how much material thestudent doesn’t know or hasn’t yet mastered• It includes a broad range of activities andtasks that teachers use to evaluate student’sprogress and growth on a daily basis.
To make use of evaluation, assessment and test procedures moreeffective it is necessary to clarify what these concepts are and toexplain how they differ from one anotherIt is all-inclusive and it is the widest basis for collecting informationin education.It involves looking at all factors that influence the learning process:syllabus, objectives, course design, and materials.Test is a subcategory of assessment, it is a formal systematicprocedure used to gather information about student progress.Assessment is part of evaluation because it is concerned withthe student and with what the student does. It refers tothe variety of ways of collecting information on a learner’slanguage ability or achievement.
The most common use of language tests is toidentify strengths and weaknesses in student’sabilitiesInformation gleaned from tests also assist us indeciding who should be allowed to participatein a particular course or program area.Another common use of tests is to provideinformation about effectiveness of programsinstructions
• They asses student’s level of language abilities so they can be placedin an appropriate course or class. This type of test indicated the levelat which a student will learn most effectively. The primary aim is tocreate groups of learners that are homogeneous in level• They measures capacity or general ability to learn a foreign language.(Although not commonly used these days)• They identify language area in which student needs further help. Theinformation gained from diagnostic tests are crucial for furthercourse activities and providing students with remediation.
• They measures the progress that students are makingtoward defined course or program goals. Progress testsare generally teacher produced because they cover lessmaterial and assess fewer objectives• They are similar to progress tests. They are usuallyadministrated at the mid- and end- point of the semester oracademic year.• The content is generally based on the specific course contentor on the course objectives.• They assess the overall language ability of students atvarying levels.• They tell us how capable a person is in a particularlanguage skill area.
• Objective versus subjective tests- sometimes testsare distinguished by the manner in which they arescored by comparing a student’s responses with anestablished set of acceptable/correct responses on ananswer key. With objectively scored tests, the scorerdoes not require particular knowledge or training inthe examined area• In contrast, a subjective test, such as writing anessay, requires scoring by opinion or personaljudgment so the human element is very important.• Even experienced scorer need moderated trainingsessions to ensure inter-rater reliability
Criterion referenced tests versus Standardizedtests-• Criterion referenced tests are usually developed tomeasure mastery of well-defined instructional objectivesspecific for a particular course or program. Theirpropose is to measure how much learning has occurred.Students performance is compared only to the amount orpercentage of material learned.• Standardized tests are designed to measure globallanguage abilities. Students’ scores are interpretedrelative to all other students who take the exam. Theirpurpose is to spread students out along a continuum ofscores so that those with low abilities in a certain skillare at one end of the normal distribution and those withhigh scores are at the other end, with the majority ofthe students falling between extremes.
Summative versus formative tests-• Tests or tasks administered at the end of the course todetermine if students have achieved the objectives setout in the curriculum are called summative assessments.they are often used to decide which students move on toa higher level• Formative assessments however, are carried out withthe aim of using the results to improve instruction, sothey are given during course and feedback is provided tostudents.High-stakes versus Low-stakes tests-• High-stakes tests are those in which the results arelikely to have major impact on the lives of large numberindividuals or an large programs.• Low-stakes tests are those in which the results haverelatively minor impact on the lives of the individual oron small programs. In class progress tests or shortquizzes are examples of low-stakes tests
Validity- Measures exactly what is proposed to measure.- Involves performance that samples the test the test’s criterion.- Offers useful, meaningful information about a test-taker’s abilities.- Is supported by an argument.Criterion-relatedvalidityConstruct-relatedvalidityConsequentialvalidity (Impact)Content-relatedvalidity
Introduction• Objectives will include 4 distinct components:Audience, Behavior, Condition and Degree.• Objectives must be both observable and measurable to be effective.• Use of words like understand and learn in writing objectives aregenerally not acceptable as they are difficult to measure.• Written objectives are a vital part of instructional design because theyprovide the roadmap for designing and delivering curriculum.• Throughout the design and development of curriculum, a comparisonof the content to be delivered should be made to the objectivesidentified for the program. This process, called performanceagreement, ensures that the final product meets the overall goal ofinstruction identified in the first level objectives.
- Describe the intended learner or end user of the instruction- Often the audience is identified only in the 1st level of objectivebecause of redundancyDescribes learner capabilityMust be observable and measurable (you will define the measurement elsewhere in thegoal)If it is a skill, it should be a real world skillThe “behavior” can include demonstration of knowledge or skills in any of the domainsof learning: cognitive, psychomotor, affective, or interpersonal- Equipment or tools that may (or may not) be utilized in completionof the behavior- Environmental conditions may also be included- States the standard for acceptable performance(time, accuracy, proportion, quality, etc)
The common mistakes have beengrouped into four categories asfollows:- General examination characteristics.- Item characteristics.- Test validity concerns.- Administrative and scoring issues.
GeneralExaminationCharacteristicsItemcharacteristicsTest-validityconcernsAdministrative and scoringissue: Lack of cheating controlInadequate instructionAdministrative inequitiesLack of pilotingSubjectivity of scoring• Too difficult or too easy• Insufficient nr of items• Redundancy of test type• Lack of confidence measure• Negative wash back through non-occurrent forms• Tricky questions• Redundant wording• Divergence cues• Convergence cues• Option number• Mixed content• Wrong medium• Common knowledge• Syllabus mismatch• Content matching
Tradition assessment Pencil-and-paper test. Answer the question Choose or produce acorrect grammatical formor vocabulary item. Good to check reading andlistening comprehensionabilityAlternative assessment• Reveal what students cando with language• It is scored differently• Students can evaluate theirown learning and learn fromthe evaluation process• Gives instructors a way toconnect assessment withreview of learningstrategies
- They are build around thetopics of the interest to thestudents- They replicate real-worldcommunication context andsituations-They require students toproduce a quality product orperformance-The evaluation criteria andstandards are known to thestudent- They involve multi-stagetasks and real problems thatrequire creative use oflanguage rather than simplerepetition-They involve interactionbetween assessor andperson assessedThey allow for self-evaluation
Rubrics- provide measurement of quality ofperformance on the basis of establishedcriteria.There are four main types of rubrics:• Holistic rubrics• Analytic rubrics• Primary trait rubrics• Multi-trait rubrics
In holistic evaluation, ratersmake judgments by forming an overall impression of a performance andmatching it to the best fit from among the descriptions on the scale.• They are often written genericallyand can be used with many tasks.• They emphasize what learners cando, rather than what they cannot do.• They save time by minimizing thenumber of decisions raters mustmake.• Trained raters tend to apply themconsistently, resulting in more reliablemeasurement.• They are easily understood byyounger learners.• They do not provide specificfeedback to test takers about thestrengths and weaknesses of theirperformance.• Performances may meet criteria intwo or more categories, making itdifficult to select the one bestdescription. (If this occursfrequently, the rubric may be poorlywritten.)
Analytic scales areusually associated with generic rubrics and tend to focus on broad dimensions ofwriting or speaking performance. These dimensions may be the same as those foundin a generic, holistic scale, but they are presented in separate categories and ratedindividually. Points may be assigned for performance on each of the dimensions and atotal score calculated.• They provide useful feedbackto learners on areas ofstrength and weakness.• Their dimensions can beweighted to reflect relativeimportance.• They can show learners thatthey have made progressover time in some or alldimensions when the samerubric categories are usedrepeatedly• They take more time tocreate and use.
primary trait scoring would be strictlyclassified as task-specific, and performance would be evaluated on only one trait, suchas the "Persuading an audienceEx. Primary Trait: Persuading an audience0 Fails to persuade the audience.1 Attempts to persuade but does not provide sufficient support.2 Presents a somewhat persuasive argument but without consistentdevelopment and support3 Develops a persuasive argument that is well developed and supported.
multiple trait scoring rubrics are based on the concepts of primary traitscoring, to provide diagnostic feedback to learners about performance on"context-appropriate and task-appropriate criteria" for a specified topic.• The rubrics are aligned with the task and curriculum.• Aligned and well-written primary and multiple trait rubrics can ensureconstruct and content validity of criterion-referenced assessments.• Feedback is focused on one or more dimensions that are important in thecurrent learning context.• With a multiple trait rubric, learners receive information about their strengthsand weaknesses.• Primary and multiple trait rubrics are generally written in language thatstudents understand.• Teachers are able to rate performances quickly.• Many rubrics of this type have been developed by teachers who are willingto share them online, at conferences, and in materials available forpurchase.