crowdtruth @vu faculty colloquium (june 2015)
TRANSCRIPT
Web & Media Group
http://lora-aroyo.org @laroyo
CrowdTruth 7 Myths about Human Annotation
Web & Media Group
http://lora-aroyo.org @laroyo 2
Bulgaria The Netherlands
Sofia 1997
2001
2006
Web & Media Group
2012 sabbatical @IBM Research http://lora-aroyo.org @laroyo 3
Web & Media Group
2011
Web & Media Group
Open Domain Question-Answering Machine – Rich Natural Language Questions
Won a 2-game Jeopardy match against all-time winners
Web & Media Group
http://lora-aroyo.org @laroyo 6
Web & Media Group
http://lora-aroyo.org @laroyo 7
Web & Media Group
Watson Education @ VU
• Intro on Cognitive Computing & Watson • Lecture to 1st year bachelor IMM & CS
• Watson & Social Web • Lecture to Master Information Science
• Watson & Crowdsourcing • 2 day course at Big Data in Society Summer School • 9-10 July, 2015 (@VU)
• Watson for Industry • 2 day professional course @IBM Amsterdam • End September 2015
http://lora-aroyo.org @laroyo 8
Web & Media Group
http://lora-aroyo.org @laroyo 9
Web & Media Group
http://lora-aroyo.org @laroyo 10
Human Annotation
Central in Machine Learning Training & Evaluation
Web & Media Group
http://lora-aroyo.org @laroyo 11
Fallacy of Universal Truth The Experts Know Best
Web & Media Group
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Other passionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into rousing, cheerful, fun, poignant, wis9ul, campy, quirky, tense, anxious, any of the 5 confident, sweet, amiable, bi>ersweet, whimsical, wi>y, intense, vola?le, clusters boisterous, good-‐natured autumnal, wry visceral rowdy brooding
Choose one:
Which is the mood most appropriate for each song?
One Truth?
Who is the Expert?
Goal:
(Lee and Hu 2012)
http://lora-aroyo.org @laroyo 12
Web & Media Group
• One truth: data collection efforts assume one correct interpretation for every example
• All examples are created equal: ground truth treats all examples the same – either match the correct result or not
• Detailed guidelines help: if examples cause disagreement - add instructions to limit interpretations
• Disagreement is bad: increase quality of annotation data by reducing disagreement among the annotators
• One is enough: most of the annotated examples are evaluated by one person
• Experts are better: annotators with domain knowledge provide better annotations
• Once done, forever valid: annotations are not updated; new data not aligned with old
7 Myths
myths directly influence the practice of collecting human annotated data; Need to be
revised with a new theory of truth (CrowdTruth)
http://lora-aroyo.org @laroyo 13
Web & Media Group
human disagreement & vagueness of expression
are part of the human semantics
http://lora-aroyo.org @laroyo 14
Web & Media Group
disagreement is beautiful …
diversity of opinion independent perspectives
multitude of contexts gives the big picture
http://lora-aroyo.org @laroyo 15
Web & Media Group
http://lora-aroyo.org @laroyo 16
“we treat human brains as processors in a distributed system each performing a small part
of a massive computation”
Human Computation
Luis von Ahn
Web & Media Group
crowd annotator annotation
example
annotation choices
Knowlton, J.Q. (1966). On the De5inition of "Picture". AV Communication Review. 14 (2), 157–183.
passionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into rousing, cheerful, fun, poignant, wis9ul, campy, quirky, tense, anxious, any of the 5 confident, sweet, amiable, bi>ersweet, whimsical, wi>y, intense, vola?le, clusters boisterous, good-‐natured autumnal, wry visceral rowdy brooding
Cluster 1 Cluster 2 Cluster 5
Triangle of disagreement
Web & Media Group
http://lora-aroyo.org @laroyo 18
• annotator disagreement is signal, not noise.
• it is indicative of the variation in human semantic interpretation of signs
• it can indicate ambiguity, vagueness, similarity & quality
Web & Media Group
http://lora-aroyo.org @laroyo 19
Results from Crowdsourcing Medical Relations in Text
Web & Media Group
http://lora-aroyo.org @laroyo 20
CrowdTruth.org
Web & Media Group
Crowd-Watson team 2013
http://lora-aroyo.org @laroyo 21
Web & Media Group
http://lora-aroyo.org @laroyo 22
Web & Media Group
CrowdTruth team is growing, 2014 http://lora-aroyo.org @laroyo 23
Web & Media Group The Crew 2015
Web & Media Group
https://www.youtube.com/watch?v=CyAI_lVUdzM
To be AND not to be: quantum intelligence?
Lora Aroyo & Chris Welty
http://lora-aroyo.org