eMOOCs2015 Does peer grading work?

Does peer grading work? How to
implement and improve it?
Comparing instructor and peer assessment in
MOOC GdP
Rémi Bachelet, Drissa Zongo, Aline Bourelle
Download this slideshow : http://goo.gl/GiFvXb

Massive evaluation in MOOCs : Peer assessment vs.
Quizzes
• Quizzes
– Massive scale, but
• inability to process, grade and provide feedback for complex and open-
ended student assignments
• no critical thinking
• Peer assessment
– Evaluating rich assignments on a massive scale – Possible?
Accurate?
– Major learning benefits expected,
• student autonomy, teaching paradigm shift
• in Bloom's taxonomy, higher levels of learning
2

4 Research questions
1. How to train MOOC students to grade their peers and
provide constructive feedback?
– Qualitative/experience testing
2. Is peer grading as accurate as instructor grading? Superior?
– Quantitative data/hypothesis testing
3. Which grading algorithm is best?
– Quantitative data/hypothesis testing:
4. How many peer grades are required to provide an accurate
final grade?
– Quantitative data/hypothesis testing
3

“Fundamentals of project management" MOOC /
MOOC GdP, session n°2
• Dataset: 1011 to 831 assignments submitted each week, for
5 weeks
– 4650 assignments total.
• Variety of assignments
– (next slide)
• Both instructor and peer grading were available
– 3-5 peer grades and one instructor/AT grade
4

Q1: How to train students to grade their peers and
provide constructive feedback?
• Generic peer Evaluation training:
– Major requirement of the advanced track
– 2+ videos
• rationale and importance of peer assessment
• how to write motivating and constructive feedback
• guidelines on how to use the platform for peer grading
• Specific peer Evaluation training:
– Specific resources for each assignment
• benchmark assignment, tutorial video
• interactive grading rubric
• discussion thread (1649 total posts)
6

Q2: Is individual peer grading as accurate as
instructor grading?
• ±5%, ± 10% similarity to “real” grade
– Instructors => Suchaut, B. (2008) => 39% and 65%
– Our MOOC students => our data => 36% and 60%
… but this is individual student grading
Will processing the average of peer grades instead of using only
one perform better?
– Our MOOC students => average of 3-5 grades => 56% and 82%
Average grade given by MOOC students more accurate than
instructor’s
7

Q3: best algorithm: average or median?
“Error functions”: difference with instructor grades of either
the average or the median of students grades.
Average slightly more accurate than median
8

Q4: How many peer grades to correctly estimate
“best grade”?
Peer grading quickly performs better (with two peers), than
instructor’s grading
Best “return” with 3-4 peer grades
9

Improving peer evaluation monitoring and grades
processing in MOOC GdP 4 and 5
• Estimate the quality of grades issued by peers
• Act on this information:
– dedicated VBA/Excel application => feedback on whether each
grade was correct, high or low
– .. reward accurate grading
– track whether peer grading improved with time during the course
• Add self-evaluation: best source for learning
• New system, developed for Canvas in association with Unow
• Students were asked to get a fresh look at their own work and grade it
after 1/having evaluated at least 3 other student’s assignments and
2/getting feedback on their own assignment by other students.
10

Conclusions
• Peer evaluation displays promising potential
• Not easy to implement on a massive, open scale
– Assignments = careful work, beta testing (100 hours)
– New assignments/case study for each session
– Dedicated data processing, develop team expertise
– Carefully set up:
• Deadlines reminders, targeted messages,
• How each student gets feedback
• Rewards accurate grading
• Monitoring: manual grading is still required (10-1%)
11

Recommendations for researchers
• Look closely at peer grades distribution before hypothesis testing
• How many assignments should a student be required to grade? We
recommend 4
– accounting for peers who drop out of the process
– time to work on self-assessment.
• What algorithm should be preferred?
– average if grading data has been correctly checked and filtered.
– otherwise, median is more robust (just remove outliers and get more evaluations).
• When to switch from automatic peer grading to manual instructor grading?
1. less than 2 peer grades
2. non-consensus (i.e. peer grades standard deviation >20)
3. presence of a “0” grade
… GdP4: 10%, 9% and 1.6% of assignments 1, 2 and 3 were graded manually.
12

Limitations of this study
• Develop theoretical framework & literature review
• Data processing: implement non-parametric testing
13

« Does peer grading
work? How to implement and
improve it? ». European
MOOCs Stakeholders Summit
2015, May 2015, Research
Track
https://goo.gl/3QCXDG
14
Peer Grading
Research Track -
Auditorium 4,
Tuesday, 10am

Thanks for listening!
• Twitter : @R_Bachelet, Googleplus : +Rémi Bachelet

• Mes contributions sur les MOOC
• MOOC GdP
– Enroll : gestiondeprojet.pm
– English version of courses in 2015-2016
– Twitter : #MOOCGdP
16

Année 2013/2014
ANNEX
A glimpse at the stats
18

Q2: What data pre-processing is to be used?
histograms & density
Methodology:
histograms and density
19

Q2 : Do grades follow a normal distribution?
Test of Normality
Methodology
Test of Normality : Shapiro-Wilk test.
Shapiro.test(data)
- H0 : -> Normal distribution
- H1 : -> Not a Normal distribution
Results
Seuil Alpha = 0.05
if p-value > 0.05 =>
H0
if p-value < 0.05 =>
H1
P-value < 2.2e-16 <0.05
Not a Normal distribution
20

Q3 : Similarity between peers grades et teachers grades? (1/2)
Methodology
Scatter plot
&
Line (D): y=x
21

Methodology
Kendall correlation cor.test(EP, Pairs ,method="kendall")
Pearson correlation cor.test(EP, Pairs)
Hypothesis:
- H0 : the correlation is nul
- H1 : the correlation is not nul
Theshold: 0.05
if p-value > 0.05
=> H0
if p-value < 0.05
=> H1
P-value < 0.05 => there is a correlation
correlation > 0.5 => strong correlation
Correlation (EP,
Mean (peers
grades))
Pearson Correlation Kendall Correlation
correlation
(cor)
p-value correlation (tau) p- value
0,77251 < 2.2e-16 0,6336516 < 2.2e-16
22
Q3 : Similarity between peers grades et teachers grades? (2/2)

Q4: best algorithm: average or median?
Study of the « error function »
ErreurMoy = Mean(peers grades) – Instructor Team grades
ErreurMed = Median (peers grades) – Instructor Team grades
Etude des erreurs introduites
ErreurMoy < ErreurMed
Mean (average) is the best
23

eMOOCs2015 Does peer grading work?

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (12)

Similar to eMOOCs2015 Does peer grading work?

Similar to eMOOCs2015 Does peer grading work? (20)

More from Rémi Bachelet

More from Rémi Bachelet (19)

Recently uploaded

Recently uploaded (20)

eMOOCs2015 Does peer grading work?