speaker: 黃宥、陳縕儂 national taiwan university, taiwan

51
Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features Speaker: 黃黃 黃黃黃 National Taiwan University, Taiwan

Upload: myron-perry

Post on 13-Dec-2015

250 views

Category:

Documents


5 download

TRANSCRIPT

  • Slide 1

Slide 2 Speaker: National Taiwan University, Taiwan Slide 3 Outline O Introduction O Proposed Approach O Branching Entropy O Feature Extraction O Learning Method O Experiments & Evaluation O Conclusion 2 Key Term Extraction, NTU Slide 4 Introduction Key Term Extraction, NTU 3 Slide 5 Definition O Key Term O Higher term frequency O Core content O Two types O Keyword O Key phrase O Advantage O Indexing and retrieval O The relations between key terms and segments of documents 4 Key Term Extraction, NTU Slide 6 5 Introduction Slide 7 6 acoustic model language model hmm n gram phonehidden Markov model Key Term Extraction, NTU Introduction Slide 8 7 hmm acoustic model language model n gram hidden Markov modelphone bigram Target: extract key terms from course lectures Key Term Extraction, NTU Introduction Slide 9 Proposed Approach 8 Key Term Extraction, NTU Slide 10 Automatic Key Term Extraction 9 Original spoken documents Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Slide 11 Automatic Key Term Extraction 10 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Slide 12 Automatic Key Term Extraction 11 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Slide 13 Phrase Identification Automatic Key Term Extraction 12 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU First using branching entropy to identify phrases Slide 14 Key Term Extraction Phrase Identification Automatic Key Term Extraction 13 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Learning to extract key terms by some features Key terms entropy acoustic model : Slide 15 Key Term Extraction Phrase Identification Automatic Key Term Extraction 14 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Key terms entropy acoustic model : Slide 16 Branching Entropy O hidden is almost always followed by the same word 15 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU Slide 17 Branching Entropy O hidden is almost always followed by the same word O hidden Markov is almost always followed by the same word 16 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU Slide 18 Branching Entropy O hidden is almost always followed by the same word O hidden Markov is almost always followed by the same word O hidden Markov model is followed by many different words 17 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU boundary Define branching entropy to decide possible boundary Slide 19 Branching Entropy 18 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU O Definition of Right Branching Entropy O Probability of children x i for X O Right branching entropy for X X xixi Slide 20 Branching Entropy 19 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU X O Decision of Right Boundary O Find the right boundary located between X and x i where boundary Slide 21 Branching Entropy 20 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU Slide 22 Branching Entropy 21 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU Slide 23 Branching Entropy 22 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU Slide 24 Branching Entropy 23 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Key Term Extraction, NTU O Decision of Left Boundary O Find the left boundary located between X and x i where X: model Markov hidden X boundary Using PAT Tree to implement Slide 25 O Implementation in the PAT tree O Probability of children x i for X O Right branching entropy for X 24 hidden Markov 1 model 2 chain 3 state 5 distribution 6 variable 4 X x1x1 x2x2 X : hidden Markov x 1 : hidden Markov model x 2 : hidden Markov chain Branching Entropy How to decide the boundary of a phrase? Key Term Extraction, NTU Slide 26 Key Term Extraction Phrase Identification Automatic Key Term Extraction 25 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Key terms entropy acoustic model : Extract some features for each candidate term Slide 27 Feature Extraction O Prosodic features O For each candidate term appearing at the first time 26 Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Speaker tends to use longer duration to emphasize key terms using 4 values for duration of the term duration of phone a normalized by avg duration of phone a Key Term Extraction, NTU Slide 28 O Prosodic features O For each candidate term appearing at the first time 27 Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Higher pitch may represent significant information Feature Extraction Key Term Extraction, NTU Slide 29 O Prosodic features O For each candidate term appearing at the first time 28 Higher pitch may represent significant information Feature Extraction Key Term Extraction, NTU Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Slide 30 O Prosodic features O For each candidate term appearing at the first time 29 Higher energy emphasizes important information Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Feature Extraction Key Term Extraction, NTU Slide 31 O Prosodic features O For each candidate term appearing at the first time 30 Feature Extraction Key Term Extraction, NTU Higher energy emphasizes important information Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Energy (I - IV) energy (max, min, mean, range) Slide 32 O Lexical features 31 Feature NameFeature Description TFterm frequency IDFinverse document frequency TFIDFtf * idf PoSthe PoS tag Using some well-known lexical features for each candidate term Feature Extraction Key Term Extraction, NTU Slide 33 O Semantic features O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability 32 Key terms tend to focus on limited topics D i : documents T k : latent topics t j : terms Feature Extraction Key Term Extraction, NTU Slide 34 O Semantic features O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability 33 Key terms tend to focus on limited topics Feature Extraction Key Term Extraction, NTU non-key term key term describe a probability distribution How to use it? Feature NameFeature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) Slide 35 O Semantic features O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio 34 Key terms tend to focus on limited topics Feature Extraction Key Term Extraction, NTU non-key term key term Feature NameFeature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) within-topic freq. out-of-topic freq. Slide 36 O Semantic features O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio 35 Key terms tend to focus on limited topics Feature Extraction Key Term Extraction, NTU non-key term key term within-topic freq. out-of-topic freq. Feature NameFeature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) Slide 37 O Semantic features O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy 36 Key terms tend to focus on limited topics Feature Extraction Key Term Extraction, NTU non-key term key term Feature NameFeature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) Slide 38 O Semantic features O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy 37 Key terms tend to focus on limited topics Feature Extraction Key Term Extraction, NTU non-key term key term Higher LTE Lower LTE Feature NameFeature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) LTEterm entropy for latent topic Slide 39 Key Term Extraction Phrase Identification Automatic Key Term Extraction 38 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1)K-means Exemplar 2)AdaBoost 3)Neural Network ASR speech signal ASR trans Key Term Extraction, NTU Key terms entropy acoustic model : Using learning approaches to extract key terms Slide 40 Learning Methods O Unsupervised learning O K-means Exemplar Transform a term into a vector in LTS (Latent Topic Significance) space Run K-means Find the centroid of each cluster to be the key term 39 The term in the same group are related to the key term The key term can represent this topic The terms in the same cluster focus on a single topic Key Term Extraction, NTU Slide 41 O Supervised learning O Adaptive Boosting O Neural Network 40 Automatically adjust the weights of features to produce a classifier Key Term Extraction, NTU Learning Methods Slide 42 Experiments & Evaluation 41 Key Term Extraction, NTU Slide 43 Experiments O Corpus O NTU lecture corpus O Mandarin Chinese embedded by English words O Single speaker O 45.2 hours 42 solution viterbi algorithm (Our solution is viterbi algorithm) Key Term Extraction, NTU Slide 44 O ASR Accuracy 43 LanguageMandarinEnglishOverall Char Acc (%)78.1553.4476.26 CHEN SI Model some data from target speaker AM Out-of-domain corpora Background In-domain corpus Adaptive trigram interpolation LM Bilingual AM and model adaptation Experiments Key Term Extraction, NTU Slide 45 O Reference Key Terms O Annotations from 61 students who have taken the course If the k-th annotator labeled N k key terms, he gave each of them a score of, but 0 to others Rank the terms by the sum of all scores given by all annotators for each term Choose the top N terms form the list ( N is average N k ) O N = 154 key terms 59 key phrases and 95 keywords 44 Key Term Extraction, NTU Experiments Slide 46 O Evaluation O Unsupervised learning Set the number of key terms to be N O Supervised learning 3-fold cross validation 45 Experiments Key Term Extraction, NTU Slide 47 Experiments O Feature Effectiveness O Neural network for keywords from ASR transcriptions 46 Each set of these features alone gives F1 from 20% to 42% Prosodic features and lexical features are additiveThree sets of features are all useful 20.78 42.86 35.63 48.15 56.55 Pr: Prosodic Lx: Lexical Sm: Semantic F-measure Key Term Extraction, NTU Slide 48 Experiments O Overall Performance 47 51.95 55.84 62.39 67.31 23.38 Conventional TFIDF scores w/o branching entropy stop word removal PoS filtering Branching entropy performs well K-means Exempler outperforms TFIDF Supervised approaches are better than unsupervised approaches F-measure AB: AdaBoost NN: Neural Network Key Term Extraction, NTU Slide 49 Experiments O Overall Performance 48 The performance of ASR is slightly worse than manual but reasonable Supervised learning using neural network gives the best results 23.38 20.78 51.95 43.51 55.84 52.60 62.39 57.68 67.31 62.70 F-measure AB: AdaBoost NN: Neural Network Key Term Extraction, NTU Slide 50 Conclusion 49 Key Term Extraction, NTU Slide 51 Conclusion O We propose the new approach to extract key terms O The performance can be improved by O Identifying phrases by branching entropy O Prosodic, lexical, and semantic features together O The results are encouraging 50 Key Term Extraction, NTU Slide 52 Thanks for your attention! Q & A NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture 51 Key Term Extraction, NTU