4. What's wrong with the gold
standard?
● algorithmic performance is measured on test sets vetted by
human experts → never perfectly correct
● gold standards are created assuming that for each annotated
instance there is a single right answer → doesn’t account for
alternative interpretations & clarity
● gold standard quality is measured in inter-annotator agreement
→ what happens if disagreeing annotators are both right?
The fallacy of the “one truth” assumption that pervades
computational semantics
5.
6. One Truth?
Does each sentence express the TREAT relation?
ANTIBIOTICS are the first line treatment for indications of TYPHUS.
à agreement 95%
Patients with TYPHUS who were given ANTIBIOTICS exhibited
several side-effects.
à agreement 80%
With ANTIBIOTICS in short supply, DDT was used during World War
II to control the insect vectors of TYPHUS.
à agreement 50%
7. One Truth?
Does each sentence express the TREAT relation?
ANTIBIOTICS are the first line treatment for indications of TYPHUS.
à agreement 95%
Patients with TYPHUS who were given ANTIBIOTICS exhibited
several side-effects.
à agreement 80%
With ANTIBIOTICS in short supply, DDT was used during World War
II to control the insect vectors of TYPHUS.
à agreement 50%
Disagreement can reflect lack of clarity in a sentence
8. What is the relation between the highlighted terms?
GADOLINIUM agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis it presents a risk of nephrogenic
systemic FIBROSIS.
One Interpretation?
Disagreement can indicate alternative interpretations
of relations
cause or side effect ?
9. Does each sentence express the TREAT relation?
ANTIBIOTICS are the first line treatment for indications of TYPHUS.
QUININE is not a reliable cure for MALARIA.
Disagreement can indicate low quality workers
One Quality?
15. CrowdTruth
Annotator disagreement is signal, not noise.
It is indicative of the variation in human
semantic interpretation of signs
It can indicate ambiguity, vagueness,
similarity, over-generality, etc,
as well as quality
16. CrowdTruth
is the response to the current reality of
cognitive computing systems - driven by data
analytics & elevated by interpretation.
it supports the need to bring the human
semantics, representing the dynamics of
opinions and perspectives, into machine
readable form