sis sat 1000 josh dreller
TRANSCRIPT
IBM’s Watson: Is the World’s Trivia Champion the Future of Search?
Josh DrellerVP Media Technology & AnalyticsFuor Digital @fuordigital
Two Types of Innovation
• Incremental – improving current projects and advancing them in a linear fashion
• Grand Challenges – pushing the limits of science– Must be Difficult (has to be a challenge)– Must be Inspiring– Irresistible Vision – “just has to be done”
Previous IBM Grand Challenges
Open Question Answering• The way normal humans communicate• Natural Language – very ambiguous, but
at the heart of human intelligence
Last night I shot an elephant in my pajamas. How it got into my pajamas I’ll never know.
The Ultimate Advancement:Computers that can communicate with humans in Natural Language
8What Can Search Learn From Watson?
• The only way to push forward is to take huge leaps and look for self-imposed challenges even if we can’t prove out the business case right now
• What kind of Grand Challenges could Search create?– A non-spammable Search Engine?– No need for Search Engine Optimization?– ??
The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key Dimensions
Broad/Open Domain
Complex Language
High Precision
Accurate Confidence
HighSpeed
$800In cell division, mitosis
splits the nucleus & cytokinesis splits this liquid
cushioning the nucleus
$200If you're standing, it's the
direction you should look to check out the wainscoting.
$1000Of the 4 countries in the
world that the U.S. does not have diplomatic relations
with, the one that’s farthest north
Jeopardy Reaction• Not too inspired at first• Weren’t interested in a circus sideshow• 2009, IBM setup a moch-studio at their
New York research facility• Sparring matches with ex-Jeopardy
winners• Eventually saw the potential and thought
it was something special
A Very Simple Question for a Computer
In (( 12,546,798 * P ) ^ 2) / 34,567.46 = ?
= .00885
Greater than or less than 1?50/50 Shot
Real Language is Real Hard
• Chess– A finite, mathematically well-defined search
space– Limited number of moves and states– Grounded in explicit, unambiguous
mathematical rules
• Human Language– Ambiguous, contextual and implicit– Grounded only in human cognition– Seemingly infinite number of ways to express the same meaning
The Opposite of Current Computer Language
• Questions not designed for a computer to answer – Slang– Crafty questions– Shorthand– Rhyme– Regionalism– Anagrams
• Complex Language!
Structured vs. Unstructured Data
One day, from among his city views of Ulm, Otto chose a watercolor to send to Albert Einstein as a remembrance of Einstein´s birthplace.
Person Born In
A. Einstein Ulm
Structured
Unstructured
Where was Einstein born?
16Common Sense Knowledge Base
• An ontology of classes and individuals• Parts and materials of objects• Properties of objects (such as color and size)• Functions and uses of objects• Locations of objects and layouts of locations• Locations of actions and events• Durations of actions and events• Preconditions of actions and events• Effects (post conditions) of actions and events• Subjects and objects of actions• Behaviors of devices• Stereotypical situations or scripts• Human goals and needs• Emotions• Plans and strategies• Story themes• Contexts
Can a can
CanCan?
17What Can Search Learn From Watson?
• We need to focus on what computers aren’t good at, not what they are good at
• Keywords and Links are not savvy enough. Natural language is key to a next generation search engine
• Most of human knowledge is kept in unstructured data sources or based on common sense context
The DeepQA Project
• Dr. David Ferrucci• 25-30 full time researchers from many disciplines.• 2007-2011• Millions of dollars• Post Jeopardy implications
Speed Results• Deployed Watson
on 2,880 IBM POWER 750 computer cores
• Went from 2 hours per question on a single CPU to an average of just 3 seconds – fast enough to compete with the best.
Example Question
IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE
Related Content(Structured & Unstructured)
Primary Search
Wilhelm TempelHMS Paramour
Isaac Newton
Halley’s Comet
Pink Panther
Christiaan Huygens
Peter Sellers
Edmond Halley
…
Candidate Answer Generation
1) Edmond Halley (0.85)2) Christiaan Huygens
(0.20)3) Peter Sellers (0.05)
Merging &Ranking
EvidenceRetrieval
Question Analysis
Keywords: 1698, comet, paramour, pink, …AnswerType(comet discoverer)Date(1698)Took(discoverer, ship)Called(ship, Paramour Pink)…
[0.58 0 -1.3 … 0.97]
[0.71 1 13.4 … 0.72]
[0.12 0 2.0 … 0.40]
[0.84 1 10.6 … 0.21]
[0.33 0 6.3 … 0.83]
[0.21 1 11.1 … 0.92]
[0.91 0 -8.2 … 0.61]
[0.91 0 -1.7 … 0.60]
EvidenceScoring
Spati
al
Tem
pora
l
Lexic
alTa
xono
mic
…
Confidence is Key• Watson only rings in if it can reach a
statistically significant confidence in time• Some questions take longer than others• Some questions will be able to answer
less confident than others• Watson manages risk in betting based on
confidence
Embarrassingly Parallel Computing• Def: “Little or no effort is required to separate
the problem into a number of parallel tasks”• Works on many algorithms at once and comes
back together with confidence scores.– different from distributed computing problems
(such as Google’s MapReduce) that require communication between tasks, especially communication of intermediate results.
24What Can Search Learn From Watson?• We can’t be bound by the constraints of
current technology• Two fold process of first coming up with
answers then vetting them with more evidence
• Ideas like Parallel Processing will allow us to jump ahead
Ken Jennings & Brad Rutter
27
The Best Human Performance: Our Analysis Reveals the Winner’s Cloud
Winning Human Performance
2007 QA Computer System
Grand Champion Human Performance
Top human players are remarkably
good.
Each dot represents an actual historical human Jeopardy! game
More Confident Less Confident
Computers?
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% Answered
Baseline
12/2007
8/2008
5/2009
10/2009
11/2010
12/2008
DeepQA: Incremental Progress in Precision and Confidence 6/2007-11/2010
5/2008
Now Playing in the Winners Cloud
4/2010
Prec
isio
n
Confidence Bar
30What Can Search Learn From Watson?
• Confidence bar would be a great addition to SERPs
• We must benchmark what is “good” and then aim higher
• These things take time (and money)
TJ Watson Research CenterYorktown, NYTwo Games: Aired February 14-16, 2011
The End – Humans Win
Financial Industry• Generates large amounts of data and
growing 70% per year• Not just numbers, but all info that would
influence the biz landscape (news, articles, blogs, etc)
• Recent financial crisis shows failures of lack of understanding in interdependencies
Most Confident Diagnosis: Diabetes and EsophogitisMost Confident Diagnosis: Diabetes
UTI
Diabetes
Influenza
hypokalemia
Renal failure
esophogitis
Diagnosis Models
Symp
FamHist
Meds
Find Confidence
Most Confident Diagnosis: Influenza
Most Confident Diagnosis: UTI
Considers and synthesizes a broad range of evidence improving quality, reducing cost
DeepQA in Continuous Evidence-Based Diagnostic Analysis
Symptoms
Tests/FindingsMedications
Family History
Notes/Hypotheses
Huge Volumes of Texts, Journals, References, DBs etc.
Patient History
37What Can Search Learn From Watson?
• Even the most daunting task can be overcome
• It’s not company versus company, it’s stretching human knowledge
• How can search engines help other industries
Sources
• “What is Watson?” presentation by Adam Lally, IBM Research
• Jeopardy website with videos: http://www.jeopardy.com/minisites/watson/
• NYTimes article: “What Is I.B.M.’s Watson?http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?_r=2&ref=opinion
• Wired magazine:“IBM’s Watson Supercomputer Wins Practice Jeopardy Round”http://www.wired.com/epicenter/2011/01/ibm-watson-jeopardy/#
• More technical: AI magazine “Building Watson: An Overview of the DeepQA Project”http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf
39
Thank You