survey of approaches to information retrieval of speech messages

48
Survey of Approaches to Information Retrieval of Speech Messages Kenney Ng Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology February 16 , 1996 DRAFT 報報報 報報報

Upload: gale

Post on 14-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Survey of Approaches to Information Retrieval of Speech Messages. Kenney Ng Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology February 16 , 1996 DRAFT. 報告人:朱惠銘. Survey of Approaches to Information Retrieval of Speech Messages. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Survey of Approaches to  Information Retrieval of Speech Messages

Survey of Approaches to Information Retrieval of Speech Messages

Kenney NgSpoken Language Systems GroupLaboratory for Computer ScienceMassachusetts Institute of Technology

February 16 , 1996DRAFT

報告人:朱惠銘

Page 2: Survey of Approaches to  Information Retrieval of Speech Messages

Survey of Approaches to Information Retrieval of Speech Messages

Introduction Information Retrieval Text RetrievalDifferences between

text and speech mediaInformation Retrieval of

Speech Messages

Page 3: Survey of Approaches to  Information Retrieval of Speech Messages

1 Introduction

Process, organize, and analyze the data. Present the data in human usable form. Find the “interesting” piece of information

efficiently. Increasingly large portions in spoken

language information: recorded speech messages radio and television broadcasts

Development of automatic methods.

Page 4: Survey of Approaches to  Information Retrieval of Speech Messages

2 Information Retrieval

2.1 DefinitionThe representation, storage , organization

and accessing of information items.Return the best match of the “request”

provided by the user.There is no restriction on the type of

documents. Text Retrieval , Document Retrieval Image Retrieval , Speech Retrieval Multi-media Retrieval

Page 5: Survey of Approaches to  Information Retrieval of Speech Messages

2.2 Information Retrieval vs. Database Retrieval

Database Retrieval Information Retrieval

Return the specific facts (answer or exactly match request)

Return the relevant to the user’s request

Structure record are well defined.

Typically not well structured

Complete specification of user’s information needed.

Incomplete specification of the user’s information

Seeking the answer that is a specific fact or piece of information.

Interested in a general topic or subject area and wants to find out more about it.

Page 6: Survey of Approaches to  Information Retrieval of Speech Messages

2.3 Component Processes

Creating document representations (indexing)

Creating request representations (query formation)

Comparing representations

(retrieval)

Evaluating retrieved documents

(relevance feedback)

Page 7: Survey of Approaches to  Information Retrieval of Speech Messages

2.3 Component Processes (cont.)Performance

Recall The fraction of all the relevant documents in the entire

collection that are retrieved in response to a query.

Precision The fraction of the retrieved documents that are relevant.

Average precision The precision values obtained at each new relevant

document in the ranked output for an individual query are averaged.

Page 8: Survey of Approaches to  Information Retrieval of Speech Messages

3 Text Retrieval

3.1 Indexing and Document Representation

3.2 Query Formation3.3 Matching Query and

Document Representation

Page 9: Survey of Approaches to  Information Retrieval of Speech Messages

3.1 Indexing and Document RepresentationTerms and Keywords

A list of words extracted from the full text document.

Construct a Stop list to remove the useless words.

Under the usage of synonymsConstruct a dictionary structure to modifyTo replace each word in one class

Tradeoff exists between normalization and discrimination in the indexing process

Page 10: Survey of Approaches to  Information Retrieval of Speech Messages

Index Term Weighting

Term frequency

The frequency of occurrence of each term in the document

For term tk in document di

),( ki tdtf

Page 11: Survey of Approaches to  Information Retrieval of Speech Messages

Index Term Weighting

Inverse document frequency Approach of weighting each term inversely

proportional to the number of documents in which the term occurs.

For term tk N is the total number of documentsntk is the number of documents with term tk

)log()(kt

kn

Ntidf

Page 12: Survey of Approaches to  Information Retrieval of Speech Messages

Index Term Weighting

Weights to terms Terms that occur frequently in particular

documents but rarely in the overall collection should receive a large weight.

j tjji

tki

j

jji

kkiik

nN

idftdtf

nN

tdtf

tidftdtf

tidftdtfdtw k

2222

)(),(

)log(),(

)(),(

)(),(),(

Page 13: Survey of Approaches to  Information Retrieval of Speech Messages

3.2 Query Formation

Relevance Feedback The IR system automatically modifies a query

based on user feedback about documents retrieved in an initial run.

nonrelireli

oldnewdi

di

di

diqq

Page 14: Survey of Approaches to  Information Retrieval of Speech Messages

3.2 Query Formation

Extracting from a user request a representation of its content.

The indexing method also applicable to query formation.

Page 15: Survey of Approaches to  Information Retrieval of Speech Messages

3.3 Matching Query and Document Representations

Boolean Model, Extended Boolean Model

Vector Space ModelProbabilistic Models

Page 16: Survey of Approaches to  Information Retrieval of Speech Messages

Boolean Model

Document representation Binary value variable

True: the term is present in the document False: the term is absent in the document

The document can be represented in a binary vector

Query Boolean query : AND, OR and NOT

Matching function Standard rule of Boolean logic If the document representation satisfy the query

expression then that document matches the query

Page 17: Survey of Approaches to  Information Retrieval of Speech Messages

Extended Boolean Model

The retrieval decision of the Boolean Model may be too harsh.

The extended boolean model

This is maximal for a document contain all the terms and decreases the numbers of matching term decreases.

pp

Kpp

andK

dddqdsim

121

])1(.......)1()1(

[1),(

Page 18: Survey of Approaches to  Information Retrieval of Speech Messages

Extended Boolean Model

For the OR query

This is minimal for a document that contains none of the terms and increases as the number of matching terms increases.

The variable p is a constant in the range 1≤p≤∞ that is determined empirically;it is typically in the range 2≤p≤5.

pp

Kpp

orK

dddqdsim

121

].......

[),(

Page 19: Survey of Approaches to  Information Retrieval of Speech Messages

Vector Space Model

Documents and queries are represented as vector in a K-dimensional space

K is the number of indexing terms.

K

kk

K

kk

K

kkk

qq

dqdqsim

1

2

1

2

1

)()(),(

Page 20: Survey of Approaches to  Information Retrieval of Speech Messages

Probabilistic Models

Baye’s Decision Rule The probability that the document d is relevant to the

query q denotes The probability that the document d is non-relevant to

the query q denotes Cr is the cost of retrieving a non-relevant document Cn is the cost of not retrieving a relevant document The expected cost of retrieving a extraneous document

is

Cn

Cr

qdRp

qdRpCqdRpCqdRp rn

),|(

),|(),|(),|(

),|( qdRp

),|( qdRp

rCqdRp ),|(

Page 21: Survey of Approaches to  Information Retrieval of Speech Messages

Probabilistic Models (cont.)

How to compute the and which are posterior probabilities?

Base on Bayes’ Rule

, are the priori probabilities of relevance and non-relevance of a document.

, are the likelihoods or class conditional probabilities.

)|(

)|(),|(),|(

)|(

)|(),|(),|(

qdp

qRpqRdpqdRp

qdp

qRpqRdpqdRp

),|( qdRp ),|( qdRp

)|( qRp )|( qRp

),|( qRdp),|( qRdp

Page 22: Survey of Approaches to  Information Retrieval of Speech Messages

Probabilistic Models (cont.)

)|(),|(

)|(),|(

)|(),|(

)|(

)|(

)|(),|(

),|(

),|(

qRpqRdp

qRpqRdp

qRpqRdp

qdp

qdp

qRpqRdp

qdRp

qdRp

Now we have to estimate and

),|( qRdp),|( qRdp

Page 23: Survey of Approaches to  Information Retrieval of Speech Messages

Probabilistic Models (cont.)

Assumptions The document vectors are binary, indicating the

presence or absence of each indexing term. Each term has a binomial distribution. There are no interactions between the terms.

K

k

dk

dk

K

k

k

K

k

dk

dk

K

k

k

kk

kk

qqqRdpqRdp

ppqRdpqRdp

1

1

1

1

1

1

)1(),|(),|(

)1(),|(),|(

dvectordocumenttheintermitheisdtermsindexKk thk 1,0,,.....,1

),|1(),|1( qRdpqqRdpp kkkk

Page 24: Survey of Approaches to  Information Retrieval of Speech Messages

Probabilistic Models (cont.)

Cwd

q

p

qRp

qRp

pq

qpd

qqqRp

ppqRp

qRdpqRp

qRdpqRp

qdRp

qdRpdqsim

K

k

kk

K

k k

kK

k kk

kkk

K

k

dk

K

k

dk

K

k

dk

K

k

dk

kk

kk

1

11

1

1

1

1

1

1

1

1log

)|(

)|(log

)1(

)1(log

)1()|(

)1()|(log

),|()|(

),|()|(log

),|(

),|(log),(

Page 25: Survey of Approaches to  Information Retrieval of Speech Messages

Probabilistic Models (cont.)

)1(

)1(log

kk

kkk

pq

qpw

wk is the same as the relevance weight of kth index term

Assume pk a constant value : 0.5 qk overall frequency : nk/N

)1log(log)1(

)1(log

)2

11(

)1(2

1

kN

nN

n

kk

kkk

n

N

pq

qpw

k

k

Page 26: Survey of Approaches to  Information Retrieval of Speech Messages

4 Differences between text and speech media

Speech is a richer and more expressive medium than text. (mood, tone)

Robustness of the retrieval models to noise or errors in transcription.

How to accurately extract and represent the contents of a speech message in a form that can be efficiently stored and searched.

Page 27: Survey of Approaches to  Information Retrieval of Speech Messages

5 Information Retrieval of Speech Messages

Speech Message RetrievalLarge Vocabulary Word Recognition ApproachSub-Word Unit ApproachWord Spotting Approaches

Speech Message Classification and SortingTopic IdentificationsTopic SpottingTopic Clustering

Page 28: Survey of Approaches to  Information Retrieval of Speech Messages

Large Vocabulary Word Recognition Approach

Suggested by CMU in Information digital video library project.

A user can interact with the text retrieval system to obtain video clips stored in the library that are relevant to his request.

Large vocabularyspeech recognizer

Sound trackof video

Textualtranscript

Natural languageunderstanding

Full-text informationretrieval system

Page 29: Survey of Approaches to  Information Retrieval of Speech Messages

Sub-Word Unit Approach

Syllabic UnitsPhonetic Units

Page 30: Survey of Approaches to  Information Retrieval of Speech Messages

Syllabic Units

VCV-featuresSub-word units consist of a maximum

sequence of consonants enclosed between two maximum sequences of vowels.eg: INFORMATION has INFO,ORMA,ATIO

vcv-featuresTake subset of these features as the

indexing terms.

Page 31: Survey of Approaches to  Information Retrieval of Speech Messages

Syllabic Units

CriteriaOccur frequently enough for a reliable acoustic

model to be trained for it.Not occur so frequently that its ability to

discriminate between different messages is poor.

Process

query VCV-features tf*idf weight

Document representation

Cosine similarity function

Document with highest score

Page 32: Survey of Approaches to  Information Retrieval of Speech Messages

Syllabic Units

Major problemThe acoustic confusability of

VCV-feature based approach is not taken into account during the selection of indexing features

Page 33: Survey of Approaches to  Information Retrieval of Speech Messages

Phonetic Units

Using variable length phone sequences as indexing feature. These features can be viewed as “pseudo -word”

and were shown to be useful for detecting or spotting topics in recorded military radio broadcasts.

An automatic procedure based on “digital trees” is used to search the possible subsequences

A Hidden Markov Model (HMM) phone recognizer with 52 monophone models is used to process the speech

More domain independent than a word based system.

Page 34: Survey of Approaches to  Information Retrieval of Speech Messages

Word Spotting Approaches

Between the simple phonetic and the complex large-vocabulary recognition.

Two different ways that word spotting has been used.

1. Small, fixed number of keywords are selected a priori for both recognition and indexing.

2. The speech messages in the collection are processed and stored in a form (e.g. phone lattice) that allows arbitrary keywords to be searched for after they are specified by the user.

Page 35: Survey of Approaches to  Information Retrieval of Speech Messages

Speech Message Classification and Sorting

Topic Identifications (1) K keywords nk is the binary value indicating the presence or

absence of keyword wk. Finding that topic Ti which maximum the score Si

K

k ki

kiki

wpTp

wTpnS

1 )()(

),(log

Page 36: Survey of Approaches to  Information Retrieval of Speech Messages

Speech Message Classification and Sorting

Topic Identifications (1) If there are 6 topics , top scoring 40 words each,

total 240 keywords . These keywords used on the text transcriptions

of the speech messages 82.4% classification accuracy achieved

If a genetic algorithm used to reduced the number of keywords down to 126 with a small drop in classification performance to 78.2% .

Page 37: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Identifications (2)

The topic dependent unigram language models K is the number of keywords in the indexing

vocabulary nk is the number of times keyword wk occurs in the

speech message p( wk | Ti ) is the unigram or occurrence probability of

keyword wk in the set of class Ti message.

K

k

K

k

ikkn

iki TwpnTwps k

0 0

)|(log)|(log

Page 38: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Identifications (2)

Number of wordsThe topic

classification accuracy

All 8431 words in the recognition vocabulary 72.5%

a subset of 4600 words by performing a X2 hypothesis test based on contingency tables

to select the “important” keywords74%

A genetic algorithm search was then used to Reduce to 203

70%

Page 39: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Identifications (3)

The length normalized topic score N is the total number of words in speech message K is the number of keywords in the indexing

vocabulary nk is the number of times keyword wk occurs in the

speech message p( wk | Ti ) is the unigram or occurrence probability of

keyword wk in the set of class Ti message.

K

k

ikki TwpnN

S0

)|(log1

Page 40: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Identifications (3)

750 keywords Classification accuracy is 74.6%

Page 41: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Identifications (4)

The topic model is extended to a mixture of multinomial M is the number of multinomial model components Πm is the weight of the mth multinomial component K is the number of keywords in the indexing

vocabulary nk is the number of times keyword wk occurs in the

speech message p( wk | Ti ) is the unigram or occurrence probability of

keyword wk in the set of class Ti message.

})|({log01

K

k

nikm

M

j

mikTwpS

Page 42: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Identifications (4)

Experiments indicate that the more complex models do not perform as well as the simple single mixture model.

Page 43: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Spotting (1)

“usefulness” measure how discriminating the word is for the topic.

and are the probabilities of detecting the keyword in the topic and unwanted

This measure select words that occur often in the topic and have high discriminability .

)|(

)|(log)|(),(

Twp

TwpTwpTwu

k

kkk

)|( Twp k )|( Twp k

Page 44: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Spotting (2)

Performed by accumulating over a window of speech (typically 60 seconds)

The log likelihood ratio of the detected keywords to produce a topic score for that region of the speech message.

K

k k

kk

Twp

Twpns

1 )|(

)|(log

Page 45: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Spotting (2)

Try to capture dependencies between the keywords are examined.

w represent the vector of keywords is the coefficient of model .

Their experiments show that using a carefully chosen log-linear model can give topic spotting performance that is better than using the basic model that assumes keyword independence

)(

)()exp(

)|(

)|(

)|(

)|(log

0

0

Tp

Tpn

Twp

Twp

nwTp

wTp

K

k

kk

K

k

kk

k

Page 46: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Clustering

Try to discover structure or relationships between messages in a collection.

The clustering processTokenizationSimilarity computationClustering

Page 47: Survey of Approaches to  Information Retrieval of Speech Messages

Topic Clustering (cont.)

Tokenization to come up with a suitable representation of the speech message which can be used in the next two steps.

Similarityit needs to compare every pair of messages,N-gram model is used.

Clusteringusing hierarchical tree clustering or nearest neighbor classification.

Work well under true transcription texts figure of merit (FOM) 90% rates

Using speech input is worse than texts, it down to 70% FOM using recognition output, unigram language models and tree-based clustering.

Page 48: Survey of Approaches to  Information Retrieval of Speech Messages

Thanks for all