龙星计划课程 : 信息检索 course overview & background

65
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1 龙龙龙龙龙龙 : 龙龙龙龙 Course Overview & Background ChengXiang Zhai ( 翟翟翟 ) Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology, Statistics University of Illinois, Urbana-Champaign http://www-faculty.cs.uiuc.edu/~czhai, [email protected]

Upload: edita

Post on 06-Jan-2016

134 views

Category:

Documents


1 download

DESCRIPTION

龙星计划课程 : 信息检索 Course Overview & Background. ChengXiang Zhai (翟成祥) Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology, Statistics University of Illinois, Urbana-Champaign http://www-faculty.cs.uiuc.edu/~czhai, [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1

龙星计划课程 :信息检索   Course Overview & Background

ChengXiang Zhai (翟成祥 ) Department of Computer Science

Graduate School of Library & Information Science

Institute for Genomic Biology, Statistics

University of Illinois, Urbana-Champaign

http://www-faculty.cs.uiuc.edu/~czhai, [email protected]

Page 2: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 2

Outline

• Course overview

• Essential background

– Probability & statistics

– Basic concepts in information theory

– Natural language processing

Page 3: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 3

Course Overview

Page 4: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 4

Course Objectives

• Introduce the field of information retrieval (IR)

– Foundation: Basic concepts, principles, methods, etc

– Trends: Frontier topics

• Prepare students to do research in IR and/or related fields

– Research methodology (general and IR-specific)

– Research proposal writing

– Research project (to be finished after the lecture period)

Page 5: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 5

Prerequisites

• Proficiency in programming (C++ is needed for assignments)

• Knowledge of basic probability & statistics (would be necessary for understanding algorithms deeply)

• Big plus: knowledge of related areas

– Machine learning

– Natural language processing

– Data mining

– …

Page 6: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 6

Course Management

• Teaching staff– Instructor: ChengXiang Zhai (UIUC)

– Teaching assistants:

• Hongfei Yan (Peking Univ)

• Bo Peng (Peking Univ)

• Course website: http://net.pku.edu.cn/~course/cs410/

• Course group discussion: http://groups.google.com/group/cs410pku

• Questions: First post the questions on the group discussion forum; if questions are unanswered, bring them to the office hours (first office hour: June 23, 2:30-4:30pm)

Page 7: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 7

Format & Requirements

• Lecture-based:

– Morning lectures: Foundation & Trends

– Afternoon lectures: IR research methodology

– Readings are usually available online

• 2 Assignments (based on morning lectures)

– Coding (C++), experimenting with data, analyzing results, open explorations (~5 hours each)

• Final exam (based on morning lectures): 1:30-4:30pm, June 30.

– Practice questions will be available

Page 8: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 8

Format & Requirements (cont.)

• Course project (Mini-TREC)

– Work in teams

– Phase I: create test collections (~ 3 hours, done within lecture period)

– Phase II: develop algorithms and submit results (done in the summer)

• Research project proposal (based on afternoon lectures)

– Work in teams

– 2-page outline done within lecture period

– full proposal (5 pages) due later

Page 9: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 9

Coverage of Topics: IR vs. TIM

Text Information Management(TIM)

Information Retrieval(IR)

Multimedia, etc

IR and TIM will be used interchangeably

Page 10: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 10

What is Text Info. Management?

• TIM is concerned with technologies for managing and exploiting text information effectively and efficiently

• Importance of managing text information

– The most natural way of encoding knowledge• Think about scientific literature

– The most common type of information• How much textual information do you produce and

consume every day?

– The most basic form of information• It can be used to describe other media of

information

– The most useful form of information!

Page 11: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 11

Text Management Applications

Access Mining

Organization

Select information

Create Knowledge

Add Structure/Annotations

Page 12: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 12

Examples of Text Management Applications

• Search– Web search engines (Google, Yahoo, …)

– Library systems

– …

• Recommendation– News filter

– Literature/movie recommender

• Categorization– Automatically sorting emails

– …

• Mining/Extraction– Discovering major complaints from email in customer service

– Business intelligence

– Bioinformatics

– …

• Many others…

Page 13: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 13

Elements of Text Info Management Technologies

Search

Text

Filtering

Categorization

Summarization

Clustering

Natural Language Content Analysis

Extraction

Mining

VisualizationRetrievalApplications

MiningApplications

InformationAccess

KnowledgeAcquisition

InformationOrganization

Focus of the course

Page 14: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 14

Text Management and Other Areas

TM Algorithms

User

Text Storage

Compression

Probabilistic inferenceMachine learning

Natural language processing

Human-computer interaction

TM Applications Software engineeringWeb

Computer science

InformationScience

Page 15: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 15

Related Areas

InformationRetrieval Databases

Library & InfoScience

Machine LearningPattern Recognition

Data Mining

NaturalLanguageProcessing

ApplicationsWeb, Bioinformatics…

StatisticsOptimization

Software engineeringComputer systems

Models

Algorithms

Applications

Systems

Page 16: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 16

Publications/Societies (Incomplete)

ACM SIGIR

VLDB, PODS, ICDE

ASIS

Learning/Mining

NLP

Applications

Statistics

Software/systems

COLING, EMNLP, ANLP

HLT

ICML, NIPS, UAIRECOMB, PSB

JCDL

Info. Science

Info Retrieval

ACM CIKM

DatabasesACM SIGMOD

ACL

ICML

AAAI

ACM SIGKDD

ISMB WWW

SOSP

OSDI

TREC

Page 17: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 17

Schedule: available at http://net.pku.edu.cn/~course/cs410/

Page 18: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 18

Date Morning Lecture (8:30-11:30)(Foundation & Trends)

Afternoon Lecture (1:30-2:30) (Research Methodology)

Notes

6/21 Sat

Course overview and background (probability, statistics, information theory, NLP) Slides: pptLecture Notes: Prob & Stat, Info Theory, NLPReadings:Bush 45, Rosenfeld's note on estimation, Rosenfeld's note on information theory,

Introduction to IR research Slides: ppt

6/22 Sun

Information Retrieval Overview (part 1) (basic concepts, history, evaluation) Lecture Notes: text retrieval, Readings: Singhal's review (Error), Book-Ch8. TREC measures

Prepare yourself for IR research Mini-TREC task specification ready

6/23 Mon

Information Retrieval Overview (Part 2) (basic retrieval models, system implementation, applications)

Find a good IR research topic Assign #1 out

6/24 Tue

Statistical Language Models for IR (probabilistic retrieval models, KL-divergence model, special retrieval tasks)

Formulate IR research hypotheses

Assign #2 out

6/25 Wed

Modern Retrieval Frameworks (axiomatic, decision-theoretic)Final exam practice questions available

6/26 Thu

Personalized Search & User Modeling (implicit feedback, explicit feedback, active feedback)

Test/Refine IR research hypotheses

Proposal team due

6/27 Fri Natural Language Processing for IR (phrase indexing, dependency analysis, sense disambiguation, sentiment retrieval)

Write and publish an IR paper

6/28 Sat No class Mini-TREC Phase I Task due

6/29 Sun

Topic Models for Text mining (PLSA, LDA, extensions and applications)

Proposal outline due

6/30 Mon

Future of IR, course summary Final Exam (1:30-4:30) Assigns #1, #2 due

7/5 Sat Research proposal due

7/? Mini-TREC data sets ready

8/? Mini-TREC Phase II Task due

Page 19: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 19

Essential Backgroud 1:

Probability & Statistics

Page 20: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 20

Prob/Statistics & Text Management

• Probability & statistics provide a principled way to quantify the uncertainties associated with natural language

• Allow us to answer questions like:– Given that we observe “baseball” three times and “game” once

in a news article, how likely is it about “sports”? (text categorization, information retrieval)

– Given that a user is interested in sports news, how likely would the user use “baseball” in a query? (information retrieval)

Page 21: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 21

Basic Concepts in Probability

• Random experiment: an experiment with uncertain outcome (e.g., tossing a coin, picking a word from text)

• Sample space: all possible outcomes, e.g.,

– Tossing 2 fair coins, S ={HH, HT, TH, TT}

• Event: ES, E happens iff outcome is in E, e.g.,

– E={HH} (all heads)

– E={HH,TT} (same face)

– Impossible event ({}), certain event (S)

• Probability of Event : 1P(E) 0, s.t.

– P(S)=1 (outcome always in S)

– P(A B)=P(A)+P(B) if (AB)= (e.g., A=same face, B=different face)

Page 22: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 22

Basic Concepts of Prob. (cont.)

• Conditional Probability :P(B|A)=P(AB)/P(A)

– P(AB) = P(A)P(B|A) =P(B)P(A|B)

– So, P(A|B)=P(B|A)P(A)/P(B) (Bayes’ Rule)

– For independent events, P(AB) = P(A)P(B), so P(A|B)=P(A)

• Total probability: If A1, …, An form a partition of S, then

– P(B)= P(BS)=P(BA1)+…+P(B An) (why?)

– So, P(Ai|B)=P(B|Ai)P(Ai)/P(B)

= P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]

– This allows us to compute P(Ai|B) based on P(B|Ai)

Page 23: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 23

Interpretation of Bayes’ Rule

)(

)()|()|(

EP

HPHEPEHP ii

i

Hypothesis space: H={H1 , …, Hn} Evidence: E

If we want to pick the most likely hypothesis H*, we can drop P(E)

Posterior probability of Hi Prior probability of Hi

Likelihood of data/evidenceif Hi is true

)()|()|( iii HPHEPEHP

Page 24: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 24

Random Variable• X: S (“measure” of outcome)

– E.g., number of heads, all same face?, …

• Events can be defined according to X

– E(X=a) = {si|X(si)=a}

– E(Xa) = {si|X(si) a}

• So, probabilities can be defined on X

– P(X=a) = P(E(X=a))

– P(aX) = P(E(aX))

• Discrete vs. continuous random variable (think of “partitioning the sample space”)

Page 25: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 25

An Example: Doc Classification

X1: [sport 1 0 1 1]

Topic the computer game baseball

X2: [sport 1 1 1 1]

X3: [computer 1 1 0 0]

X4: [computer 1 1 1 0]

X5: [other 0 0 1 1] … …

For 3 topics, four words, n=?

Events

Esport ={xi | topic(xi )=“sport”}

Ebaseball ={xi | baseball(xi )=1}

Ebaseball,computer = {xi | baseball(xi )=1 & computer(xi )=0}

Sample Space S={x1,…, xn}

Conditional Probabilities:P(Esport | Ebaseball ), P(Ebaseball|Esport),

P(Esport | Ebaseball, computer ), ...

An inference problem:

Suppose we observe that “baseball” is mentioned, how likely the topic is about “sport”?

But, P(B=1|T=“sport”)=?, P(T=“sport” )=?

P(T=“sport”|B=1) P(B=1|T=“sport”)P(T=“sport”)

Thinking in terms of random variables

Topic: T {“sport”, “computer”, “other”}, “Baseball”: B {0,1}, … P(T=“sport”|B=1), P(B=1|T=“sport”), ...

Page 26: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 26

Getting to Statistics ...

• P(B=1|T=“sport”)=? (parameter estimation)

– If we see the results of a huge number of random experiments, then

– But, what if we only see a small sample (e.g., 2)? Is this estimate still reliable?

• In general, statistics has to do with drawing conclusions on the whole population based on observations of a sample (data)

)""(

)"",1()""|1(ˆ

sportTcount

sportTBcountsportTBP

Page 27: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 27

Parameter Estimation

• General setting:

– Given a (hypothesized & probabilistic) model that governs the random experiment

– The model gives a probability of any data p(D|) that depends on the parameter

– Now, given actual sample data X={x1,…,xn}, what can we say about the value of ?

• Intuitively, take your best guess of -- “best” means “best explaining/fitting the data”

• Generally an optimization problem

Page 28: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 28

Maximum Likelihood vs. Bayesian

• Maximum likelihood estimation

– “Best” means “data likelihood reaches maximum”

– Problem: small sample

• Bayesian estimation

– “Best” means being consistent with our “prior” knowledge and explaining data well

– Problem: how to define prior?

)|(maxargˆ

XP

)()|(maxarg)|(maxargˆ

PXPXP

Page 29: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 29

Illustration of Bayesian Estimation

Prior: p()

Likelihood: p(X|)

X=(x1,…,xN)

Posterior: p(|X) p(X|)p()

: prior mode ml: ML estimate: posterior mode

Page 30: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 30

Maximum Likelihood Estimate

Data: a document d with counts c(w1), …, c(wN), and length |d|Model: multinomial distribution M with parameters {p(wi)} Likelihood: p(d|M)Maximum likelihood estimator: M=argmax M p(d|M)

( ) ( )

1 11

1

'

1 1

'

1 1

| |( | ) , ( )

( )... ( )

( | ) log ( | ) ( ) log

( | ) ( ) log ( 1)

( ) ( )0

1, ( ) | |

i i

N Nc w c w

i i i ii iN

N

i ii

N N

i i ii i

i ii

i i

N N

i ii i

dp d M where p w

c w c w

l d M p d M c w

l d M c w

c w c wl

Since c w d So

( ), ( )

| |i

i i

c wp w

d

We’ll tune p(wi) to maximize l(d|M)

Use Lagrange multiplier approach

Set partial derivatives to zero

ML estimate

11

N

ii

Page 31: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 31

What You Should Know

• Probability concepts:

– sample space, event, random variable, conditional prob. multinomial distribution, etc

• Bayes formula and its interpretation

• Statistics: Know how to compute maximum likelihood estimate

Page 32: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 32

Essential Background 2:

Basic Concepts in Information Theory

Page 33: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 33

Information Theory

• Developed by Shannon in the 40s

• Maximizing the amount of information that can be transmitted over an imperfect communication channel

• Data compression (entropy)

• Transmission rate (channel capacity)

Page 34: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 34

Basic Concepts in Information Theory

• Entropy: Measuring uncertainty of a random variable

• Kullback-Leibler divergence: comparing two distributions

• Mutual Information: measuring the correlation of two random variables

Page 35: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 35

Entropy: Motivation

• Feature selection: – If we use only a few words to classify docs, what kind of

words should we use?

– P(Topic| “computer”=1) vs p(Topic | “the”=1): which is more random?

• Text compression: – Some documents (less random) can be compressed more than

others (more random)

– Can we quantify the “compressibility”?

• In general, given a random variable X following distribution p(X), – How do we measure the “randomness” of X?

– How do we design optimal coding for X?

Page 36: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 36

Entropy: Definition

2

( ) ( ) ( ) log ( )

0 log 0 0, log logx

H X H p p x p x all possible values

Define

Entropy H(X) measures the uncertainty/randomness of random variable X

1 ( ) 0.5

( ) 0 1 ( ) 0.8

0 ( ) 1

fair coin p Head

H X between and biased coin p Head

completely biased p Head

Example:

P(Head)

H(X)

1.0

Page 37: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 37

Entropy: Properties

• Minimum value of H(X): 0

– What kind of X has the minimum entropy?

• Maximum value of H(X): log M, where M is the number of possible values for X

– What kind of X has the maximum entropy?

• Related to coding

p(x)

1log E

p(x)

1p(x)log

p(x)p(x)logH(X)

2

x2

x2

" " "# " log ( ) ( ) [ log ( )]pInformation of x bits to code x p x H X E p x

Page 38: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 38

Interpretations of H(X)

• Measures the “amount of information” in X

– Think of each value of X as a “message”

– Think of X as a random experiment (20 questions)

• Minimum average number of bits to compress values of X

– The more random X is, the harder to compress

A fair coin has the maximum information, and is hardest to compressA biased coin has some information, and can be compressed to <1 bit on averageA completely biased coin has no information, and needs only 0 bit

" " "# " log ( ) ( ) [ log ( )]pInformation of x bits to code x p x H X E p x

Page 39: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 39

Conditional Entropy

• The conditional entropy of a random variable Y given another X, expresses how much extra information one still needs to supply on average to communicate Y given that the other party knows X

• H(Topic| “computer”) vs. H(Topic | “the”)?

X)|p(Y logE x)|p(y log y)p(x,

x)|p(y log x)|p(yp(x)

x)X|p(x)H(YX)|H(Y

X Y

X Y

X

x y

x y

x

Page 40: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 40

Cross Entropy H(p,q)

What if we encode X with a code optimized for a wrong distribution q?

Expected # of bits=? ( , ) [ log ( )] ( ) log ( )px

H p q E q x p x q x

Intuitively, H(p,q) H(p), and mathematically,

( )( , ) ( ) ( )[ log ]

( )

( )log [ ( ) ] 0

( )

x

x

q xH p q H p p x

p x

q xp x

p x

' : ( ) ( )

, , 1

i i i ii i

ii

By Jensen s inequality p f x f p x

where f is a convex function and p

Page 41: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 41

Kullback-Leibler Divergence D(p||q)

What if we encode X with a code optimized for a wrong distribution q?

How many bits would we waste? ( )

( || ) ( , ) ( ) ( ) log( )x

p xD p q H p q H p p x

q x

Properties:

- D(p||q)0 - D(p||q)D(q||p) - D(p||q)=0 iff p=q

KL-divergence is often used to measure the distance between two distributions

Interpretation:

-Fix p, D(p||q) and H(p,q) vary in the same way

-If p is an empirical distribution, minimize D(p||q) or H(p,q) is equivalent to maximizing likelihood

Relative entropy

Page 42: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 42

Cross Entropy, KL-Div, and Likelihood

1

1

/ : ( ,..., )

11: ( ) ( , ) ( , )

0 . .

N

N

ii

Data Sample for X Y y y

if x yEmpirical distribution p x y x y x

o wN

1

1

( ) ( )

log ( ) log ( ) ( ) log ( ) ( ) log ( )

N

ii

N

ii x x

L Y p X y

L Y p X y c x p X x N p x p x

Likelihood:

log Likelihood:

1log ( )

1log ( ) ( , ) ( || ) ( )

1, arg max log ( ) arg min ( , ) arg min ( , ) arg min 2

L YN

p p p p

L Y H p p D p p H pN

Fix the data L Y H p p D p pN

Criterion for selecting a good model Perplexity(p)

Nxcxp /)()(~

Page 43: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 43

Mutual Information I(X;Y)

Comparing two distributions: p(x,y) vs p(x)p(y)

,

( , )( ; ) ( , ) log ( ) ( | ) ( ) ( | )

( ) ( )x y

p x yI X Y p x y H X H X Y H Y H Y X

p x p y

Properties: I(X;Y)0; I(X;Y)=I(Y;X); I(X;Y)=0 iff X & Y are independent

Interpretations: - Measures how much reduction in uncertainty of X given info. about Y - Measures correlation between X and Y - Related to the “channel capacity” in information theory

Examples: I(Topic; “computer”) vs. I(Topic; “the”)?

I(“computer”, “program”) vs (“computer”, “baseball”)?

Page 44: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 44

What You Should Know

• Information theory concepts: entropy, cross entropy, relative entropy, conditional entropy, KL-div., mutual information

– Know their definitions, how to compute them

– Know how to interpret them

– Know their relationships

Page 45: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 45

Essential Background 3:

Natural Language Processing

Page 46: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 46

What is NLP?

)ِه) … -ْه+ِل َأ َو-َم-َع- ِه) -ْف+ِس) َن َم-َع- 5 َو-َص-ًاِد)َق-ًا 5 -ًا +َن َأَم)ْي <وَن- -ُك َي َأَن ًاَن) +ِس- اإلَن َع-ِل-ى -ِج)ُب< َيَع-ِل-ى -ْع+َم-َل- َي -َن+ َو-َأ الو-َط-ِن) َن)

+ ْأ َش- )َع+الِء) ِإ ِف)ي Tُج<ْه+ٍد Xَل> ُك +ُذ<َل- -ْب َي -َن+ َو-َأ )ِه) اَن +َر- ْي َو-ُج)… َم-ًا

How can a computer make sense out of such a string?

- What are the basic units of meaning (words)?- What is the meaning of each word? - How are words related with each other? - What is the “combined meaning” of words? - What is the “meta-meaning”? (speech act)- Handling a large chunk of text- Making sense of everything

Syntax

Semantics

Pragmatics

Morphology

DiscourseInference

La listas actualizadas figuran como Aneio I.

Arabic text

Spanish text

Page 47: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 47

An Example of NLP

A dog is chasing a boy on the playgroundDet Noun Aux Verb Det Noun Prep Det Noun

Noun Phrase Complex Verb Noun PhraseNoun Phrase

Prep PhraseVerb Phrase

Verb Phrase

Sentence

Dog(d1).Boy(b1).Playground(p1).Chasing(d1,b1,p1).

Semantic analysis

Lexicalanalysis

(part-of-speechtagging)

Syntactic analysis(Parsing)

A person saying this maybe reminding another person to

get the dog back…

Pragmatic analysis(speech act)

Scared(x) if Chasing(_,x,_).+

Scared(b1)

Inference

Page 48: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 48

If we can do this for all the sentences, then …

BAD NEWS:

Unfortunately, we can’t.

General NLP = “AI-Complete”

Page 49: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 49

NLP is Difficult!

• Natural language is designed to make human communication efficient. As a result,

– we omit a lot of “common sense” knowledge, which we assume the hearer/reader possesses

– we keep a lot of ambiguities, which we assume the hearer/reader knows how to resolve

• This makes EVERY step in NLP hard

– Ambiguity is a “killer”

– Common sense reasoning is pre-required

Page 50: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 50

Examples of Challenges• Word-level ambiguity: E.g.,

– “design” can be a noun or a verb (Ambiguous POS)

– “root” has multiple meanings (Ambiguous sense)

• Syntactic ambiguity: E.g.,

– “natural language processing” (Modification)

– “A man saw a boy with a telescope.” (PP Attachment)

• Anaphora resolution: “John persuaded Bill to buy a TV for himself.” (himself = John or Bill?)

• Presupposition: “He has quit smoking.” implies that he smoked before.

Page 51: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 51

Despite all the challenges, research in NLP has also made

a lot of progress…

Page 52: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 52

High-level History of NLP• Early enthusiasm (1950’s): Machine Translation

– Too ambitious

– Bar-Hillel report (1960) concluded that fully-automatic high-quality translation could not be accomplished without knowledge (Dictionary + Encyclopedia)

• Less ambitious applications (late 1960’s & early 1970’s): Limited success, failed to scale up

– Speech recognition

– Dialogue (Eliza)

– Inference and domain knowledge (SHRDLU=“block world”)

• Real world evaluation (late 1970’s – now)– Story understanding (late 1970’s & early 1980’s)

– Large scale evaluation of speech recognition, text retrieval, information extraction (1980 – now)

– Statistical approaches enjoy more success (first in speech recognition & retrieval, later others)

• Current trend: – Boundary between statistical and symbolic approaches is disappearing.

– We need to use all the available knowledge

– Application-driven NLP research (bioinformatics, Web, Question answering…)

Stat. language models

Robust component techniques

Applications

Knowledge representation

Deep understanding in

limited domainShallow understanding

Page 53: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 53

The State of the Art

A dog is chasing a boy on the playgroundDet Noun Aux Verb Det Noun Prep Det Noun

Noun Phrase Complex Verb Noun PhraseNoun Phrase

Prep PhraseVerb Phrase

Verb Phrase

Sentence

Semantics: some aspects

- Entity/relation extraction- Word sense disambiguation- Anaphora resolution

POSTagging:

97%

Parsing: partial >90%(?)

Speech act analysis: ???

Inference: ???

Page 54: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 54

Technique Showcase: POS Tagging

This sentence serves as an example of Det N V1 P Det N P annotated text… V2 N

Training data (Annotated text)

POS Tagger“This is a new sentence”This is a new sentenceDet Aux Det Adj N

This is a new sentenceDet Det Det Det Det

… …Det Aux Det Adj N … …V2 V2 V2 V2 V2

Consider all possibilities,and pick the one withthe highest probability

1 1

1 1 1

11

( ,..., , ,..., )

( | )... ( | ) ( )... ( )

( | ) ( | )

k k

k k k

k

i i i ii

p w w t t

p t w p t w p w p w

p w t p t t

Method 1: Independent assignmentMost common tag

Method 2: Partial dependency

Page 55: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 55

Technique Showcase: ParsingS NP VPNP Det BNPNP BNPNP NP PPBNP NVP V VP Aux V NPVP VP PPPP P NP

V chasingAux isN dogN boyN playgroundDet theDet aP on

Grammar

Lexicon

Generate

S

NP VP

BNP

N

Det

A

dog

VP PP

Aux V

is

the playground

ona boy

chasing

NP P NP

S

NP VP

BNP

N

dog

PPAux V

is

ona boy

chasing

NP

P NP

Det

A

the playground

NP

roller skates

1.00.30.40.3

1.0

0.01

0.003

Probability of this tree=0.000015

Choose a tree with highest prob….

Can also be treated as a classification/decision problem…

Page 56: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 56

Semantic Analysis Techniques

• Only successful for VERY limited domain or for SOME aspect of semantics

• E.g.,

– Entity extraction (e.g., recognizing a person’s name): Use rules and/or machine learning

– Word sense disambiguation: addressed as a classification problem with supervised learning

– Anaphora resolution …

In general, exploiting machine learning

and statistical language models…

Page 57: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 57

What We Can’t Do

• 100% POS tagging– “He turned off the highway.” vs “He turned off the fan.”

• General complete parsing– “A man saw a boy with a telescope.”

• Deep semantic analysis– Will we ever be able to precisely define the meaning of “own” in “John

owns a restaurant.”?

Robust & general NLP tends to be “shallow”,

while “deep” understanding doesn’t scale up …

Page 58: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 58

Major NLP Applications

• Speech recognition: e.g., Auto telephone call routing

• Text management– Text retrieval/filtering

– Text classification

– Text summarization

– Text mining

– Query answering

• Language tutoring– Spelling/grammar correction

• Machine translation– Cross-language retrieval

– Restricted natural language

• Natural language user interface

Our focus

Page 59: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 59

NLP & Text Management

Better NLP => Better Text Management

Bad NLP => Bad Text Management?

Robust, shallow NLP tends to be more useful than deep, but fragile NLP.

Errors in NLP can hurt text management performance…

Page 60: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 60

How Much NLP is Really Needed?

Tasks Dependency on NLP

Classification/Retrieval

Summarization/

Extraction/

Mining

Translation/Dialogue

Question Answering

“Easier” & More “workarounds”

Page 61: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 61

Workaround Example I: ELIZA

Person : Men are all alike.ELIZA : In what way?Person : They're always bugging us about something or other.ELIZA : Can you think of a specific example?Person : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here?Person : He says I'm depressed much of the time.ELIZA : I am sorry to hear that you are depressed.Person : It's true. I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?Person : I need some help, that much seems certain.ELIZA : What would it mean to you if you got some help?Person : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family

ELIZA plays the role of a therapistHow does it work?

Heuristic rules based on Keyword matching! E.g.,

I remember X =>Do you often think of X?

always => Can you think of a specific example?

It keeps the dialogue going on, but NEVER makes any claim!

A breakdown example:Person : Failure is the mother of success.ELIZA : Tell me more about your family

No NLP, but useful. Perhaps we should call this NLP?Statistical NLP often has a similar flavor with “SOFT” rules LEARNED from data

Page 62: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 62

Workaround Example II: Statistical Translation

Learn how to translate Chinese to English from many example translations

Intuitions:

- If we have seen all possible translations, then we simply lookup- If we have seen a similar translation, then we can adapt- If we haven’t seen any example that’s similar, we try to generalize what we’ve seen

EnglishSpeaker

TranslatorNoisyChannel

P(E) P(C|E)

EnglishWords (E)

ChineseWords(C)

EnglishTranslation

P(E|C)=?

All these intuitions are captured through a probabilistic model

Page 63: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 63

So, what NLP techniques are most useful for text management?

Statistical NLP in general, and

statistical language models in particular

The need for high robustness and efficiency implies the dominant use of

simple models (i.e., unigram models)

Page 64: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 64

What You Should Know• NLP is the basis for text management

– Better NLP enables better text management

– Better NLP is necessary for sophisticated tasks

• But – Bad NLP doesn’t mean bad text management

– There are often “workarounds” for a task

– Inaccurate NLP can even hurt the performance of a task

• The most effective NLP techniques are often statistical with the help of linguistic knowledge

• The challenge is to bridge the gap between NLP and applications

Page 65: 龙星计划课程 : 信息检索 Course Overview & Background

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 65

Roadmap

• Today’s lecture

– Course overview

– Essential background (prob & stat, info theory, NLP)

• Next two lectures: overview of IR

– Basic concepts

– Evaluation

– Brief history

– Basic models

– …