sis sat 1000 josh dreller

IBM’s Watson: Is the World’s Trivia Champion the Future of Search?

Josh DrellerVP Media Technology & AnalyticsFuor Digital @fuordigital

Two Types of Innovation

• Incremental – improving current projects and advancing them in a linear fashion

• Grand Challenges – pushing the limits of science– Must be Difficult (has to be a challenge)– Must be Inspiring– Irresistible Vision – “just has to be done”

Previous IBM Grand Challenges

Open Question Answering• The way normal humans communicate• Natural Language – very ambiguous, but

at the heart of human intelligence

Last night I shot an elephant in my pajamas. How it got into my pajamas I’ll never know.

The Ultimate Advancement:Computers that can communicate with humans in Natural Language

8What Can Search Learn From Watson?

• The only way to push forward is to take huge leaps and look for self-imposed challenges even if we can’t prove out the business case right now

• What kind of Grand Challenges could Search create?– A non-spammable Search Engine?– No need for Search Engine Optimization?– ??

The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key Dimensions

Broad/Open Domain

Complex Language

High Precision

Accurate Confidence

HighSpeed

$800In cell division, mitosis

splits the nucleus & cytokinesis splits this liquid

cushioning the nucleus

$200If you're standing, it's the

direction you should look to check out the wainscoting.

$1000Of the 4 countries in the

world that the U.S. does not have diplomatic relations

with, the one that’s farthest north

Jeopardy Reaction• Not too inspired at first• Weren’t interested in a circus sideshow• 2009, IBM setup a moch-studio at their

New York research facility• Sparring matches with ex-Jeopardy

winners• Eventually saw the potential and thought

it was something special

A Very Simple Question for a Computer

In (( 12,546,798 * P ) ^ 2) / 34,567.46 = ?

= .00885

Greater than or less than 1?50/50 Shot

Real Language is Real Hard

• Chess– A finite, mathematically well-defined search

space– Limited number of moves and states– Grounded in explicit, unambiguous

mathematical rules

• Human Language– Ambiguous, contextual and implicit– Grounded only in human cognition– Seemingly infinite number of ways to express the same meaning

The Opposite of Current Computer Language

• Questions not designed for a computer to answer – Slang– Crafty questions– Shorthand– Rhyme– Regionalism– Anagrams

• Complex Language!

Structured vs. Unstructured Data

One day, from among his city views of Ulm, Otto chose a watercolor to send to Albert Einstein as a remembrance of Einstein´s birthplace.

Person Born In

A. Einstein Ulm

Structured

Unstructured

Where was Einstein born?

16Common Sense Knowledge Base

• An ontology of classes and individuals• Parts and materials of objects• Properties of objects (such as color and size)• Functions and uses of objects• Locations of objects and layouts of locations• Locations of actions and events• Durations of actions and events• Preconditions of actions and events• Effects (post conditions) of actions and events• Subjects and objects of actions• Behaviors of devices• Stereotypical situations or scripts• Human goals and needs• Emotions• Plans and strategies• Story themes• Contexts

Can a can

CanCan?


• We need to focus on what computers aren’t good at, not what they are good at

• Keywords and Links are not savvy enough. Natural language is key to a next generation search engine

• Most of human knowledge is kept in unstructured data sources or based on common sense context

The DeepQA Project

• Dr. David Ferrucci• 25-30 full time researchers from many disciplines.• 2007-2011• Millions of dollars• Post Jeopardy implications

Speed Results• Deployed Watson

on 2,880 IBM POWER 750 computer cores

• Went from 2 hours per question on a single CPU to an average of just 3 seconds – fast enough to compete with the best.

Example Question

IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE

Related Content(Structured & Unstructured)

Primary Search

Wilhelm TempelHMS Paramour

Isaac Newton

Halley’s Comet

Pink Panther

Christiaan Huygens

Peter Sellers

Edmond Halley

…

Candidate Answer Generation

1) Edmond Halley (0.85)2) Christiaan Huygens

(0.20)3) Peter Sellers (0.05)

Merging &Ranking

EvidenceRetrieval

Question Analysis

Keywords: 1698, comet, paramour, pink, …AnswerType(comet discoverer)Date(1698)Took(discoverer, ship)Called(ship, Paramour Pink)…

[0.58 0 -1.3 … 0.97]

[0.71 1 13.4 … 0.72]

[0.12 0 2.0 … 0.40]

[0.84 1 10.6 … 0.21]

[0.33 0 6.3 … 0.83]

[0.21 1 11.1 … 0.92]

[0.91 0 -8.2 … 0.61]

[0.91 0 -1.7 … 0.60]

EvidenceScoring

Spati

al

Tem

pora

l

Lexic

alTa

xono

mic

…

Confidence is Key• Watson only rings in if it can reach a

statistically significant confidence in time• Some questions take longer than others• Some questions will be able to answer

less confident than others• Watson manages risk in betting based on

confidence

Embarrassingly Parallel Computing• Def: “Little or no effort is required to separate

the problem into a number of parallel tasks”• Works on many algorithms at once and comes

back together with confidence scores.– different from distributed computing problems

(such as Google’s MapReduce) that require communication between tasks, especially communication of intermediate results.

24What Can Search Learn From Watson?• We can’t be bound by the constraints of

current technology• Two fold process of first coming up with

answers then vetting them with more evidence

• Ideas like Parallel Processing will allow us to jump ahead

Ken Jennings & Brad Rutter

27

The Best Human Performance: Our Analysis Reveals the Winner’s Cloud

Winning Human Performance

2007 QA Computer System

Grand Champion Human Performance

Top human players are remarkably

good.

Each dot represents an actual historical human Jeopardy! game

More Confident Less Confident

Computers?

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% Answered

Baseline

12/2007

8/2008

5/2009

10/2009

11/2010

12/2008

DeepQA: Incremental Progress in Precision and Confidence 6/2007-11/2010

5/2008

Now Playing in the Winners Cloud

4/2010

Prec

isio

n

Confidence Bar


• Confidence bar would be a great addition to SERPs

• We must benchmark what is “good” and then aim higher

• These things take time (and money)

TJ Watson Research CenterYorktown, NYTwo Games: Aired February 14-16, 2011

The End – Humans Win

Financial Industry• Generates large amounts of data and

growing 70% per year• Not just numbers, but all info that would

influence the biz landscape (news, articles, blogs, etc)

• Recent financial crisis shows failures of lack of understanding in interdependencies

Most Confident Diagnosis: Diabetes and EsophogitisMost Confident Diagnosis: Diabetes

UTI

Diabetes

Influenza

hypokalemia

Renal failure

esophogitis

Diagnosis Models

Symp

FamHist

Meds

Find Confidence

Most Confident Diagnosis: Influenza

Most Confident Diagnosis: UTI

Considers and synthesizes a broad range of evidence improving quality, reducing cost

DeepQA in Continuous Evidence-Based Diagnostic Analysis

Symptoms

Tests/FindingsMedications

Family History

Notes/Hypotheses

Huge Volumes of Texts, Journals, References, DBs etc.

Patient History


• Even the most daunting task can be overcome

• It’s not company versus company, it’s stretching human knowledge

• How can search engines help other industries

Sources

• “What is Watson?” presentation by Adam Lally, IBM Research

• Jeopardy website with videos: http://www.jeopardy.com/minisites/watson/

• NYTimes article: “What Is I.B.M.’s Watson?http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?_r=2&ref=opinion

• Wired magazine:“IBM’s Watson Supercomputer Wins Practice Jeopardy Round”http://www.wired.com/epicenter/2011/01/ibm-watson-jeopardy/#

• More technical: AI magazine “Building Watson: An Overview of the DeepQA Project”http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf

http://www.jeopardy.com/minisites/watson/

http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?_r=2&ref=opinion

http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?_r=2&ref=opinion

http://www.wired.com/epicenter/2011/01/ibm-watson-jeopardy/

http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf

39

Thank You