extractive spoken document summarization - models and features

Extractive Spoken Document Summarization - Models and Features

2007/02/08

Yi-Ting Chen ( 陳怡婷 )Department of Computer Science ＆ Information Engineering

National Taiwan Normal University, Taipei, Taiwan

2

Outline

• Introduction

• Conventional Extractive Summarization Approaches

• Probabilistic Generative Summarization Approaches

• Summarization Compaction

• Evaluation Metrics

• Experimental Results

• Conclusions and Future Work

3

Introduction (1/3)

• World Wide Web has led to a renaissance of the research of automatic document summarization, and has extended it to cover a wider range of new tasks

• Speech is one of the most important sources of information about multimedia content

• However, spoken documents associated with multimedia are unstructured without titles and paragraphs and thus are difficult to retrieve and browse– Spoken documents are merely audio/video signals or a very long

sequence of transcribed words including errors

– It is inconvenient and inefficient for users to browse through each of them from the beginning to the end

4

Introduction (2/3)

• Spoken document summarization, which aims to generate a summary automatically for the spoken documents, is the key for better speech understanding and organization

• Extractive vs. Abstractive Summarization– Extractive summarization is to select a number of indicative

sentences or paragraphs from original document and sequence them to form a summary

– Abstractive summarization is to rewrite a concise abstract that can reflect the key concepts of the document

– Extractive summarization has gained much more attention in the recent past

5

Introduction (3/3)

Large-textCorpus

Speech Signal

Acousticmodel

Linguistic model

Prosodic FeaturesExtraction

SpeechRecognition

Speech Transcription

Important Unit(Sentence)Extraction

SentenceCompaction

Statistical Features

Prosody Features

Confidence Score

Language Model

Lexical Features

Word DependencyProbability

RecordsCaptionsIndexes

SummarySummary

FeaturesExtraction

6

Background of summarization (Mani and Maybury 1999)

1950 1960 1970 1980 1990 2000

Early system using a surface-level approach (1958)

The first entity-level approaches based on syntactic analysis (1961)

The use of location features (1969)

The extended surface-level approach to include the used of cue phrases

The emergence of more extensive entity-level approaches (1972)

The first discourse-based approaches based on story grammars (1980)

A variety of different work (entity-level approaches based on AI 、 logic and production rules semantic networks、 hybrid approaches)

Recent work has almost exclusively focused on extract rather than abstracts.A renewed interest in earlier surface-level approaches.

The emergence of new areas such as multi-document summarization (1997), multiligual summarization, and multimedia summarization (1997)

Spoken-document Summarization

Text-document Summarization

The first training approach (1995)

The first SVD-based approach (1995)

More natural language generation work begins to focus on text summarization

7

Extraction Based on Sentence Locations/Structures

• Sentence extraction using sentence location information– Lead (Hajime and Manabu 2000) – Focusing on the introductory and concluding segments (Hirohata

et al. 2005)– Specific structure on some domain (Maskey et al. 2003)

• E.g., broadcast news programs － sentence position, speaker type, previous-speaker type, next-speaker type, speaker change

8

Statistical Summarization Approaches (1/8)

• Spoken sentences are ranked and selected based on some similarity measures or significant scores

(a) Similarity Measures– Vector Space Model (VSM) (Ho 2003)– The document and sentence of it are represented in vector forms– The sentences that have the highest relevance scores to the

whole document are selected– To summarize more important and different concepts in a

document• Relevance measure (Gong et al. 2001)

• Maximum Marginal Relevance (MMR) (Murray et al. 2005)

x

y

iS

D

9


(a) Similarity Measures– Relevance measure (Gong et al. 2001)

– Maximum Marginal Relevance, MMR (Murray et al. 2005)

)),()(1()),(()( SummSSimaDSSimaiS ii

MMR D

Summary

iS

The candidate sentence set

mi SSSS ,...,,..,, 21

Compute the relevance score between sentence and document D

iS

Select sentence that hasthe highest relevance score

If the number of sentence in the summary reaches

the predefined valueNo

Delete sentence and recompute the weighted term-frequency vector for the document

maxS

D

YesStop

maxS

10


(b) SVD-based Method– The sentence can also be represented as a semantic vector– While the sentence with more topic or semantic information are s

elected– LSA (Gong et al. 2001)

– DIM (Hirohata et al. 2005)

Mi

i

i

i

i

a

a

a

a

A

3

2

1

SVD

iNN

i

i

i

v

v

v

A

22

11

ˆ

Dimension reduction

iKK

i

i

v

v

11

Weighted word-frequency vector

Weighted singular-value

vector

Reduced dimension vector

Mi

i

i

i

i

a

a

a

a

A

3

2

1

SVD

iNN

i

i

i

v

v

v

A

22

11

ˆ

Dimension reduction

iKK

i

i

v

v

11

Weighted word-frequency vector

Weighted singular-value

vector

Reduced dimension vector

K

kikki vS

1

2)(

Jw

w

w

2

1

1S 2S MS

J content words

M sentences Information of word j

j

A U

1

2

K

i

Information of sentence i

tVTerm-sentence

matrixLeft singularvector matrix

Right singularvector matrix

singular value matrix

12

k

Jw

w

w

2

1

1S 2S MS

J content words

M sentences Information of word j

j

A U

1

2

K

i

Information of sentence i

tVTerm-sentence

matrixLeft singularvector matrix

Right singularvector matrix

singular value matrix

12

k

11


(b) SVD-based Method– Embedded LSA, eLSA ( 黃耀民 2005)

• Not only the sentences of the document to be summarized but also the document itself are involved in the construction of latent topic space

• The sentences in latent topic space that have the highest relevance scores to the whole document are selected

D1S 2S MS

1w

2w

Lw

D1S 2S MS

A U

Σ tV

tV

12


(c) Sentence Significance Score (SIG) (Kikuchi et al. 2003)

– Each sentence in the document is represented as a sequence of terms, which can be simply given by a significance score

– Features such as the confidence score, linguistic score or prosodic information also can be further integrated

– Sentence selection can be performed based on this score– E.g., Given a sentence

• Linguistic score:

• significance score:

• Or Sentence Significance Score

(Hirohata et al. 2005)

J

jjCjIji wCwIwL

JS

1

)()()(1

Jji wwwwS ,...,,...,, 21

J

jji wI

JS

1

)(1

).....|(log)( 1 jjj wwPwL

j

Ajj F

FfwI log)(

13


(c) Sentence Significance Score (Kong and Lee 2006) – Sentence:

• :statistical measure, such as TF/IDF• :linguistic measure, e.g., named entities and POSs• :confidence score• :N-gram score• is calculated from the grammatical structure of the sentence

• Statistical measure also can be evaluated using PLSA (Probabilistic Latent Semantic Analysis)

– Topic Significance– Term Entropy

)()()()()(1

51

4321 i

J

jjjjji Sbwgwcwlws

JS

Jji wwwwS ,...,,...,, 21

)( jws)( jwl)( jwc)( jwg

)( iSb

14


(d) Classification-based Methods– These methods need a set of training documents (or labeled dat

a) for training the classifiers – Naïve Bayes’ Classifier/Bayesian Network Classifier

(Kupiec 1995, Koumpis et al. 2005, Maskey et al. 2005)

– Support Vector Machine (SVM) (Zhu and Penn 2005)– Logistic Regression (Zhu and Penn 2005)– Gaussian Mixture Models (GMM) (Murray et al. 2005)

)|()(1

Cxpxp v

V

v

Summary

Non-summary

iS

15


(e) Combined Methods (Hirohata et al. 2005)

– Sentence Significance Score (SIG) combined with Location Information

– Latent semantic analysis (LSA) combined with Location Information

– DIM combined with Location Information

16

Probabilistic Generative Approaches (1/7)

• MAP criterion for sentence selection

• Sentence prior– Sentence prior is simply set to uniform here– Or may have to do with

• Sentence duration/position, correctness of sentence boundary, confidence score, prosodic information, etc.

• Each sentence of the document can be ranked by this likelihood value

SPSDP

DP

SPSDPDSP

i

i

ii

Sentence model Sentence prior

17


• Hidden Markov Model, HMM( 黃耀民 2005, 陳怡婷 et al. 2005)– Each sentence of the spoken document is treated as a pr

obabilistic generative model of N-grams, while the spoken document is the observation

– : the sentence model, estimated from the sentence

– : the collection model, estimated from a large corpus

(In order to have some probability of every term in the vocabulary)

– : a weighting parameter

SiD

ij

ij

Dw

Dtc

jjiHMM CwPSwPSDP,

1

SwP j S

CwP j C

SwP j

CwP j

1

Sentence model

Document (Observation)

iLji wwwwD ....21

ij

ij

Dw

Dtc

jjiHMM CwPSwPSDP,

1

S

SwP j

CwP j

1

Sentence model

Document (Observation)

iLji wwwwD ....21

ij

ij

Dw

Dtc

jjiHMM CwPSwPSDP,

1

S

18


• Relevance Model, RM (Chen et al. 2006)

– In HMM, the true sentence model might not be accurately estimated (by MLE)

• Since the sentence consists only of few terms

– In order to improve estimation of the sentence model• Each sentence has its own associated relevant

model , constructed by the subset of documents in the collection that are relevant to the sentence

• The relevance model is then linearly combined with the original sentence model to form a more accurate sentence model

SwP j

SRS

Sjjj RwPSwPSwP 1ˆ

ij

ij

Dw

Dwc

jjiHMM CwPSwPSDP,

1ˆ ˆ

i

ii S

wSSwP

)()|(

19


• A schematic diagram of extractive spoken document summarization jointly using the HMM and RM models

1D2D

iD

IR System

General Text NewsCollection

Retrieved Relevant

Documentsof S S’s HMM

Model

S’s RMModel

Sentence

Document Likelihood

)|( SDP iHMM

ContemporaryText NewsCollection

iD

S S

Spoken Documents to be Summarized Local Feedback

20


• Topical Mixture Model, TMM (Chen et al. 2006)

– Build a probabilistic latent topical space – Measure the likelihood of a sentence generating a given

document in such space

Dwn

Dw

K

kikki STPTwPSDP

,

1

1TwnP

DocumentD=w1w2…wn…wN

A sentence model

2TwnP

KTwnP

iSTP 2

iSTP 1

iK STP

The TMM model for s specific sentence Si

21


• Word Topical Mixture Model (wTMM) (Chiu and Chen 2007)

– To explore the co-occurrence relationship between words of the language

– Each word of the language as a topical mixture model for predicting the occurrence of the other word

– Each sentence of the spoken document to be summarized was treated as a composite word TMM model for generating the document

– The likelihood of the document being generated by can be expressed as:

jww

jwM

K

kwkkw jj

MTPTwPMwP1

)|()|()|(

D iS

Dwn

Dw Sw

K

kwkkiji

ij

jMTPTwPSDP

,

1,

23

Comparison of Extractive Summarization Methods

• Literal Term Matching Vs. Concept Matching – Literal Term Matching ：

• Extraction using degree of similarity (VSM, MMR)• Extraction using features score (Sentence score)• HMM, HMMRM

– Concept Matching ：• Extraction using latent semantic analysis (LSA, DIM)• TMM, wTMM

24

Comparison of Extractive Summarization Methods

• Summarizing Speech Without Text Using HMM– (Maskey and Hirschberg 2006)– L state position-sensitive HMM– L number of bins, 2L states– Features extraction and HMM Training

1 2 L

25

Summarization Compaction (1/3)

• Two-Stage Summarization Method (Furui et al. 2004)

– Sentence extraction– Sentence Compaction

• • , this score is a measure of the dependency bet

ween two words and is obtained by a phrase structure grammar, stochastic dependency context-free grammar (SDCFG)

• A set of words that maximizes a weighted sum of these scores is selected according to compression ratio and connected to create a summary using a two-stage DP technique

),( and )C( , ),( 1 jjjjj wwTw)I(wwL ),( 1 jj wwT

26


27


• Using acoustic, prosodic, semantic information and dynamic programming search algorithm to find the summarized result (Huang et al. 2005)

• A noisy-channel model for sentence compression (Knight and Marcu 2001)

• A decision-based model for sentence compression (Knight and Marcu 2001)– To decompose the rewriting operation into a sequence of actions

28

Evaluation Metrics (1/4)

• Subjective Evaluation Metrics (Direct evaluation)– Conducted by human subjects– Different levels

• Objective Evaluation Metrics– Automatic summaries were evaluated by objective metrics

• Automatic Evaluation– Summaries are evaluated by IR

29


• Objective Evaluation Metrics– Summarization accuracy (Hori et al. 2004)

• All the human summaries are merged into a single word network.

• Word accuracy of the automatic summary is then measured as a summarization accuracy in comparison with the closest word string extracted from the word network.

• Problem: the variation between manual summaries is so large. (high summarization ratio or low summarization ratio)

<s> The juicy orange fruits in California fruiting in fall </s><s> The juicy orange fruits in California fruiting in fall </s>

100)(

Len

DelInsSubLenACCY

30


• Objective Evaluation Metrics– Sentence recall/precision (Hirohata et al. 2004)

• Sentence recall/precision is commonly used in evaluating sentence-extraction-based text summarization.

• Sentence boundaries are not explicitly indicated in input speech, estimated boundaries based on recognition results do not always agree with those in manual summaries.

(Kitade et al., 2004)• F-measure, F-measure/max, F-measure/ave.

man

summan

S

SSR

sum

summan

S

SSP

PR

PRF

2

31


• Objective Evaluation Metrics– ROUGE-N (Lin et al. 2003)

• ROUGE-N is an N-gram recall between an automatic summary and a set of manual summaries.

– Cosine Measure (Saggion et al. 2002, Ho 2003)

H n

H n

SS Sgn

SS Sgnm

gC

gC

)(

)(

NROUGE

DD

H

hDhDDhD

H

RmASIMmEmASIM

DmACC

2

%),(%)(%),(1

%)( 1,,

%)(%)(

%)(%)(

,

,

,%)(%),(mEmA

mEmA

DhD

DhD

DhD

VV

VVmEmASIM

昨天　馬英九　訪問　中國大陸

昨天　馬英九　結束　訪問　回國

x

y

DA

DhE ,

32

Experimental Results (1/2)

• Preliminary Tests on 200 radio broadcast news stories collected in Taiwan (automatic transcripts with 14.17% character error rate)

– Development Set (100)

– Test Set (100)

• ROUGE-2 measure was used to evaluate the performance levels of different models

• Development set results

SD VSM LSA DIM MMR SIG HMM TMM HMMRM wTMM Random

10% 0.2653 0.2786 0.2935 0.2659 0.2972 0.3084 0.3154 0.3377 0.3493 0.1215

20% 0.3103 0.2904 0.3293 0.3025 0.3344 0.3467 0.3496 0.3617 0.3698 0.1470

30% 0.3331 0.2984 0.3490 0.3503 0.3597 0.3734 0.3600 0.3741 0.3646 0.1702

50% 0.4436 0.4072 0.4488 0.4549 0.4870 0.4768 0.4657 0.4775 0.4633 0.3177

70% 0.5413 0.5019 0.5324 0.5457 0.5632 0.5631 0.5526 0.5642 0.5534 0.4382

33

Experimental Results (2/2)

• Test set results

• Test set with non-uniform sentence prior probabilities

SD VSM LSA DIM MMR SIG HMM TMM HMMRM wTMM Random

10% 0.3073 0.3034 0.3187 0.3073 0.3144 0.2932 0.3210 0.3168 0.3248 0.1204

20% 0.3188 0.2926 0.3148 0.3214 0.3259 0.3191 0.3333 0.3240 0.3324 0.1392

30% 0.3593 0.3286 0.3383 0.3678 0.3428 0.3705 0.3741 0.3758 0.3816 0.1679

50% 0.4485 0.3906 0.4345 0.4501 0.4666 0.4732 0.4605 0.4714 0.4581 0.3163

70% 0.5425 0.4843 0.5259 0.5366 0.5538 0.5595 0.5522 0.5609 0.5424 0.4343

HMMRM Location AvgEnergy PitchVariance _EnergyVariance

0.3168 0.5039 0.3182 0.3264 0.3228

0.3240 0.4850 0.3310 0.3269 0.3240

0.3758 0.4780 0.3774 0.3744 0.3703

0.4714 0.4778 0.4725 0.4721 0.4722

0.5609 0.5539 0.5639 0.5618 0.56081

34

Conclusions and Future Work

• Various (spoken) document summarization approaches and features have been extensively investigated in the past several years

• The probabilistic generative framework seems to be promising for extractive (spoken) document summarization. We currently investigate to – Improve the proposed sentence models– Improve the estimation of sentence prior– Consider the relevance between sentences

35

Reference (1/3)

• (Mani and Maybury 1999) Inderjeet Mani and Mark T. Maybury, “Advances in Automatic Text Summarization”, the MIT Press Cambridge, Massachusetts London, England.

• (Hajime and Manabu 2000) Mochizuki Hajime, Okumura Manabu, 　“ A Comparison of Summarization Methods Based on Task-based Evaluation”, 2nd International conference on language resources and evaluation, LREC-2000, Athens, Greece.

• (Hirohata et al. 2005) Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui, “Sentence Extraction-Based Presentation Summarization Techniques and Evaluation Metrics”, ICASSP 2005.

• (Maskey et al. 2003) Sameer Raj Maskey, Julia Hirschberg, “Automatic Summarization of Broadcast News using Structural Features”, EUROSPEECH 2003.

• (Ho 2003) Y. Ho,” An initial study on automatic summarization of Chinese spoken documents”, Master Thesis, National Taiwan University, July 2003.

• (Gong et al. 2001) Y. Gong and X. Liu, “Generic text summarization using relevance measure and latent semantic analysis,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 2001, pp. 19-25.

• (Murray et al. 2005) Gabriel Murray, Steve Renals, Jean Carletta, “Extractive Summarization of Meeting Recordings”, in Proc. Eurospeech 2005.

• (Kikuchi et al. 2003) T. Kikuchi, S. Furui, and C. Hori, “Two-stage automatic speech summarization by sentence extraction and compaction,” in Proc. IEEE and ISCA Workshop on Spontaneous Speech Processing and Recognition, 2003, pp.207-210.

• (Furui et al. 2004) Sadaoki Furui, Tomonori Kikuchi, Yousuke Shinnaka, Chiori Hori, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech”, IEEE transactions on speech and audio processing, VOL. 12 No.4, July 2004.

36

Reference (2/3)

• (Kong and Lee 2006) Sheng-Yi Kong and Lin-shan Lee, “Improved Spoken Document Summarization using Probabilistic Latent Semantic Analysis (PLSA)”, the 31th IEEE International Conference on Acoustics, Speech, and Signal processing (ICASSP 2006), Toulouse, France, May 14-19, 2006.

• (Kupiec 1995) Julian Kupiec, Jan Pedersen and Francine Chen, “A Trainable Document Summarizer”, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, 1995.

• (Koumpis et al. 2005) K. Koumpis, S. Renals, “Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features”, ACM trans. Speech and Language Processing 2(1), 2005.

• (Zhu and Penn 2005) X. Zhu, G. Penn, “Evaluation of Sentence Selection for Speech Summarization”, in Proc the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP-05), pp. 39-45. September 2005.

• (Chen et al. 2006) Yi-Ting Chen, Suhan Yu, Hsin-min Wang, Berlin Chen, "Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models," the Fifth International Symposium on Chinese Spoken Language Processing ( ISCSLP 2006), Singapore, December 13-16, 2006.

• (Chen et al. 2006) Berlin Chen, Yao-Ming Yeh, Yao-Min Huang, Yi-Ting Chen, "Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information,” the 31th IEEE International Conference on Acoustics, Speech, and Signal processing (ICASSP 2006), Toulouse, France, May 14-19, 2006.

• (Chiu and Chen 2007) Hsuan-Sheng Chiu, Berlin Chen, "Word Topical Mixture Models for Dynamic Language Model Adaptation," the 32th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Hawaii, USA, April 15-20, 2007.

http://140.122.185.120/Berlin_Research/Manuscripts/2006_Summarization_HMM-RM.pdf

http://140.122.185.120/Berlin_Research/Manuscripts/2006_TMM_Summarization.pdf

37

Reference (3/3)

• (Maskey and Hirschberg 2006) Sameer Maskey, Julia Hirschberg, “Summarizing Speech Without Text Using Hidden Markov Models”, HLT-NAACL, 2006

• (Huang et al. 2005) Chien-Lin Huang, Chia-Hsin Hsieh and Chung-Hsien Wu, “Spoken Document Summarization Using Acoustic, Prosodic and Semantic Information,” in Proceedings of ICME 2005, Amsterdam, The Netherlands, 2005.

• (Knight and Marcu 2001) Kevin Knight, Daniel Marcu. “Summarization beyond sentence extraction: A probabilistic approach to sentence compression”. 2002, Artificial Intelligence 139(1): 91-107

• (Hori et al. 2004) C. Hori, T. Hirao and H. Isozaki, “Evaluation measures considering sentence concatenation for automatic summarization by sentence or word extraction,” Proc. ACL, pp. 82-88 (2004)

• (Kitade et al., 2004) Kitade, T. et al., 2004. Automatic extraction of key sentences from CSJ presentations using discourse markers and topic words. In: Proc. Third Spontaneous Speech Science and Technology workshop, pp. 111-118.

• (Hirohata et al. 2004) Makoto Hirohata, Yosuke Shinnaka, Koji Iwano and Sadaoki Furui, Sentence-extractive automatic speech summarization and evaluation techniques, Speech Communication, In Press, Corrected Proof, , Available online 5 June 2006 .

• (Lin et al. 2003) C.Y. Lin, “ROUGE: Recall-oriented Understudy for Gisting Evaluation,” 2003, http://www.isi.edu/~cyl/ROUGE/.

• (Saggion et al. 2002) Horacio Saggion and Dragomir Radev, “Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics”, COLING 2002.

38

Reference (3/3)

• ( 黃耀民 2005) 黃耀民，『以字句擷取為基礎並應用於文件分類之自動摘要之研究』，碩士論文，國立臺灣師範大學資訊工程研究所， 2005.

• ( 陳怡婷 et al. 2005) 陳怡婷、黃耀民、葉耀明、陳柏琳 ,“ 中文語音文件自動摘要之摘要模型” , 「第十屆人工智慧與應用研討會」 , December 2-3, 2005.