a survey of icassp 2013 language model department of computer science & information engineering...

17
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報報報 報報報 2013/06/19

Upload: edmund-goodwin

Post on 26-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

A Survey of ICASSP 2013Language Model

Department of Computer Science & Information EngineeringNational Taiwan Normal University

報告者:郝柏翰2013/06/19

Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

Ebru Arısoy et al., IBM T.J. Watson Research Center, NY

3

Introduction

In this work, we propose an approximate method for converting a feedforward NNLM into a back-off n-gram language model that can be used directly in existing LVCSR decoders.

We convert NNLMs of increasing order to pruned back-off language models, using lower-order models to constrain the n-grams allowed in higher-order models.

4

Method

A back-off n-gram language model takes the form

In this paper, we propose an approximate method for converting a feedforward NNLM into a back-off language model that can be directly used in existing state-of-the-art decoders.

where and represent the NNLM and background language model probabilities

5

Method

To represent NNLM probabilities exactly over the output vocabulary requires parameters in general, where V is the complete vocabulary.

While we can represent the overall NNLM as a back-off model exactly, it is prohibitively large as noted above. The technique of pruning can be used to reduce the set of n-grams for which we explicitly store probabilities

6

Method

7

Experiments

More smooth

Before

After

Use of Latent Words Language Models in ASR: a Sampling-Based

Implementation

Ryo Masumura et al., NTT Media Intelligence Laboratories, Japan

9

Introduction

This paper applies the latent words language model (LWLM) to automatic speech recognition (ASR). LWLMs are trained taking into account related words, i.e., grouping of similar words in terms of meaning and syntactic role.

In addition, this paper also describes an approximation method of the LWLM for ASR, in which words are randomly sampled on the LWLM and then a standard word n-gram language model is trained.

10

Method

Hierarchical Pitman-Yor Language Model– If we directly implement LWLM to one-pass decoding, we have to calculate

the probability distribution over current word given context

Latent Words Language Model– LWLMs are generative models with a latent variable for every observed

word in a text.

11

Method

The latent variable, called latent word , is generated by its context and observed word is generated from latent word

12

Expriments

This result shows that we can construct LWLM comparable to HPYLM if we generate sufficient text data. Moreover, highest performance was achieved with LWLM+HPYLM. This results shows that LWLM possesses properties different from those of the HPYLM, and further improvement is achieved if they are combined.

Incorporating Semantic Information to Selection of WEB Texts for

Language Model of Spoken Dialogue System

Koichiro Yoshino et al., Kyoto University, Japan

14

Introduction

A novel text selection approach for training a language model (LM) with Web texts is proposed for automatic speech recognition (ASR) of spoken dialogue systems.

Compared to the conventional approach based on perplexity criterion, the proposed approach introduces a semantic-level relevance measure with the back-end knowledge base used in the dialogue system.

We focus on the predicate-argument (P-A) structure characteristic to the domain in order to filter semantically relevant sentences in the domain.

15

Method

Selection Based on Perplexity– For a sentence , its perplexity by a seed LM trained with the document

set D is defined by

Selection Based on Semantic Relevance Measure

where C(.) stands for an occurrence count and P(D) is a normalizationfactor determined by the size of D. γ is a smoothing factor estimated with a Dirichlet prior

16

Method

For a P-A pair consisting of and , we define as a geometric mean of and

For each sentence , we compute a mean of for P-A pairs included in the sentence, defined as .

17

Experiments