Quantifying reflection

Quantifying reflection:
Creating a gold-standard for evaluating
automated reflection detection
Thomas Ullmann, Fridolin Wild, Peter Scott,
Knowledge Media Institute, The Open University

Outline
• A Model for reflection
• Related work on quantification of
reflection
• Methodology
• Data collected
• Results and discussion
• Outlook
2

Reflection is creative
sense-making of the
past
3

State of the art
in quantifying reflection
Reference Scales Unit of analysis Findings
Dyment & O’Connell (2011) Depth of reflection Studies (writings) Meta review: five studies low; four
medium; two studies high levels of
reflection
Wong et al. (1995) Depth of
reflection: habitual
to critical.
45 students Content analysis and interviews: 76%
reflectors, 11% critical reflectors.
Wald et al. (2012) Reflective to non-
reflective
93 writings 2nd year students, self selected best of
reflective field notes: 30% critically
reflective, 11% transformative reflective.
Plack et al. (2005) Frequencies of
elements and
depth of reflection
43 journals 43% reflection, 42% critical reflection;
frequencies see next slide.
Hatton & Smith (1995) Units of reflection;
dialogic versus
descriptive
‘units’ (in writings of 60
students)
After instruction: 30% dialogic reflection;
19 reflective units in average per 8-12
pages
Ross (1989) Depth of reflection 134 papers of 25 students 22% highly reflective, 34 % moderately
reflective
Williams et al. (2002) Action
classification.
56 student journals 23% verify learning, 36% new
understanding, 39% future behaviour
4

Summary: Related work
• More research on level than on elements
• Wide range for ‘level of depth’
• Measurements on students or writings/journals level
• Mostly in the context of instructed reflective writing
• Typically: Mapping from evidence to depth/breadth
=> No re-usable instrument
to measure reflection

The dimensions of reflection
Ullmann, Wild, Scott (2012): Comparing automatically detected
reflective texts with human judgements. http://ceur-ws.org/Vol-931/paper8.pdf
Documentation of insights,
plans, and intentions.
Switch point of view.
Argumentation and reasoning.
Identification of a conflict.
Awareness building over affective factors.
Explication of self-awareness, e.g.,
inner monologues, description of feelings.

Example accounts (anonymised)
Dim: Type Example
SA: Identification of
a conflict.
“[Victor] and [Morgan], you are right that I
should have applied better my own learning
instead of using the Uni ones.”
CA: Reasoning.
“I imagine this is probably in order to have a
focus and provide enough detail rather than
skim over the whole area.”
TP: Switch point of
view.
“When I am doing FRT work, I often think
about how the parents view me when they
know I haven‟t got children!”
Dim: Type Example
OD: Documentation
of an insight.
“After I saw how this lifted her mood and
eased her anxiety, I will remember that what
we can view sometimes to be small can
actually make a significant difference.”
OD: Intention.
“I would like to be involved in helping with the
site, too - although I‟m a novice! I imagine this
is probably in order to have a focus and
provide enough detail rather than skim over
the whole area.”
Dim: Type Example
OD: New
understanding.
“This has helped me reflect on my own life and
experiences whilst allowing me to empathise
with others in their own circumstances; I feel
proud of what I have achieved so far as the
work/life/study balance is always difficult to
navigate, but I‟m lucky that I have a supportive
family to help.“
None
“Bye the way, Audacity is also run under the CC
Attribution.”

Methodology:
creating a gold standard
10
Corpus
selection
Sanitize
Chunking
(for cues)
Sample
Batching
Crowd-
sourcing
„Spam‟
filtering
Objecti-
fication
mid range length
postings
OU LMS forum posts
4 subjects, 2 years de-identification
sentence level 1000 random
500
pers.
500 non-
pers..
Expand grid, 10
batches
control questions
5 raters each
Justification valid
„gold questions‟ passed„majority vote‟
interrater reliability

Crowdsourcing
• Crowdflower: the ‘virtual pedestrian area’
• Pre-tests showed:
– Really simple questions needed for HITs
– But: Quick answer options increase spam
– Short texts easier than long texts
(less spam, smaller costs)
– Shuffling of answers to avoid artefacts
• Check: larger than usual number of raters (5+) to see
how reliable judgements are

Countries (origin of request)
• In total 411 raters
• Most of them from the USA
(N=202)
• GB (N=94)
• India (N=45)
• 14 other nations (N=70)

Interrater Reliability
– Raw data
• Baseline: control questions: Krippendorff’s α = 0.43
• Control questions + survey data: α = 0.32
• Survey data: α = 0.22
– ‘objectified’ data
• Majority vote of 3 to all raters agree
– Survey data: α = 0.36, (623 out of 1,000 sentences)
• Majority vote of 4 to all agree
– Survey data: α = 0.581, (301 sentences)
• Majority vote of 5 (to all) agree:
– α = 0.98 (with outliers), (107 out of 1,000 sentences)

Discussion
• Agreement of 5 of course increases IRR
– (to 0.98 unfiltered)
– when omitting ‘over answering’: to 1.0
– But: reduces to single category sentences
• Agreement of 3 deemed good enough
– since questions were single choice,
whereas multiple anwers are correct
• Sentences are reduction, but allow
to zoom in on markers
• Context: Forum texts
• Personal vs. non personal sentences

Questions? Answers?
bit.ly/tel-advances

Quantifying reflection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Quantifying reflection

Similar to Quantifying reflection (20)

More from fridolin.wild

More from fridolin.wild (20)

Recently uploaded

Recently uploaded (20)

Quantifying reflection

Editor's Notes