1. Does peer grading work? How to
implement and improve it?
Comparing instructor and peer assessment in
MOOC GdP
Rémi Bachelet, Drissa Zongo, Aline Bourelle
Download this slideshow : http://goo.gl/GiFvXb
2. Massive evaluation in MOOCs : Peer assessment vs.
Quizzes
• Quizzes
– Massive scale, but
• inability to process, grade and provide feedback for complex and open-
ended student assignments
• no critical thinking
• Peer assessment
– Evaluating rich assignments on a massive scale – Possible?
Accurate?
– Major learning benefits expected,
• student autonomy, teaching paradigm shift
• in Bloom's taxonomy, higher levels of learning
2
3. 4 Research questions
1. How to train MOOC students to grade their peers and
provide constructive feedback?
– Qualitative/experience testing
2. Is peer grading as accurate as instructor grading? Superior?
– Quantitative data/hypothesis testing
3. Which grading algorithm is best?
– Quantitative data/hypothesis testing:
4. How many peer grades are required to provide an accurate
final grade?
– Quantitative data/hypothesis testing
3
4. “Fundamentals of project management" MOOC /
MOOC GdP, session n°2
• Dataset: 1011 to 831 assignments submitted each week, for
5 weeks
– 4650 assignments total.
• Variety of assignments
– (next slide)
• Both instructor and peer grading were available
– 3-5 peer grades and one instructor/AT grade
4
6. Q1: How to train students to grade their peers and
provide constructive feedback?
• Generic peer Evaluation training:
– Major requirement of the advanced track
– 2+ videos
• rationale and importance of peer assessment
• how to write motivating and constructive feedback
• guidelines on how to use the platform for peer grading
• Specific peer Evaluation training:
– Specific resources for each assignment
• benchmark assignment, tutorial video
• interactive grading rubric
• discussion thread (1649 total posts)
6
7. Q2: Is individual peer grading as accurate as
instructor grading?
• ±5%, ± 10% similarity to “real” grade
– Instructors => Suchaut, B. (2008) => 39% and 65%
– Our MOOC students => our data => 36% and 60%
… but this is individual student grading
Will processing the average of peer grades instead of using only
one perform better?
– Our MOOC students => average of 3-5 grades => 56% and 82%
Average grade given by MOOC students more accurate than
instructor’s
7
8. Q3: best algorithm: average or median?
“Error functions”: difference with instructor grades of either
the average or the median of students grades.
Average slightly more accurate than median
8
9. Q4: How many peer grades to correctly estimate
“best grade”?
Peer grading quickly performs better (with two peers), than
instructor’s grading
Best “return” with 3-4 peer grades
9
10. Improving peer evaluation monitoring and grades
processing in MOOC GdP 4 and 5
• Estimate the quality of grades issued by peers
• Act on this information:
– dedicated VBA/Excel application => feedback on whether each
grade was correct, high or low
– .. reward accurate grading
– track whether peer grading improved with time during the course
• Add self-evaluation: best source for learning
• New system, developed for Canvas in association with Unow
• Students were asked to get a fresh look at their own work and grade it
after 1/having evaluated at least 3 other student’s assignments and
2/getting feedback on their own assignment by other students.
10
11. Conclusions
• Peer evaluation displays promising potential
• Not easy to implement on a massive, open scale
– Assignments = careful work, beta testing (100 hours)
– New assignments/case study for each session
– Dedicated data processing, develop team expertise
– Carefully set up:
• Deadlines reminders, targeted messages,
• How each student gets feedback
• Rewards accurate grading
• Monitoring: manual grading is still required (10-1%)
11
12. Recommendations for researchers
• Look closely at peer grades distribution before hypothesis testing
• How many assignments should a student be required to grade? We
recommend 4
– accounting for peers who drop out of the process
– time to work on self-assessment.
• What algorithm should be preferred?
– average if grading data has been correctly checked and filtered.
– otherwise, median is more robust (just remove outliers and get more evaluations).
• When to switch from automatic peer grading to manual instructor grading?
1. less than 2 peer grades
2. non-consensus (i.e. peer grades standard deviation >20)
3. presence of a “0” grade
… GdP4: 10%, 9% and 1.6% of assignments 1, 2 and 3 were graded manually.
12
13. Limitations of this study
• Develop theoretical framework & literature review
• Data processing: implement non-parametric testing
13
14. « Does peer grading
work? How to implement and
improve it? ». European
MOOCs Stakeholders Summit
2015, May 2015, Research
Track
https://goo.gl/3QCXDG
14
Peer Grading
Research Track -
Auditorium 4,
Tuesday, 10am
18. Q2: What data pre-processing is to be used?
histograms & density
Methodology:
histograms and density
19
19. Q2 : Do grades follow a normal distribution?
Test of Normality
Methodology
Test of Normality : Shapiro-Wilk test.
Shapiro.test(data)
- H0 : -> Normal distribution
- H1 : -> Not a Normal distribution
Results
Seuil Alpha = 0.05
if p-value > 0.05 =>
H0
if p-value < 0.05 =>
H1
P-value < 2.2e-16 <0.05
Not a Normal distribution
20
20. Q3 : Similarity between peers grades et teachers grades? (1/2)
Methodology
Scatter plot
&
Line (D): y=x
21
21. Methodology
Kendall correlation cor.test(EP, Pairs ,method="kendall")
Pearson correlation cor.test(EP, Pairs)
Hypothesis:
- H0 : the correlation is nul
- H1 : the correlation is not nul
Theshold: 0.05
if p-value > 0.05
=> H0
if p-value < 0.05
=> H1
P-value < 0.05 => there is a correlation
correlation > 0.5 => strong correlation
Correlation (EP,
Mean (peers
grades))
Pearson Correlation Kendall Correlation
correlation
(cor)
p-value correlation (tau) p- value
0,77251 < 2.2e-16 0,6336516 < 2.2e-16
22
Q3 : Similarity between peers grades et teachers grades? (2/2)
22. Q4: best algorithm: average or median?
Study of the « error function »
ErreurMoy = Mean(peers grades) – Instructor Team grades
ErreurMed = Median (peers grades) – Instructor Team grades
Etude des erreurs introduites
ErreurMoy < ErreurMed
Mean (average) is the best
23
23. Q4: best algorithm: average or median ?
study of the difference between the two errors
Ecart =|ErreurMoy|–|ErrreurMed|
Median :-0.7500 Mean : -0.9867
=> |Median Errror | >|Mean Error |
coefficient of skewness : -0.2145285 <0
=> more negative than positive value
24
Median introduce slightly
more errors than Average