專題研究 week3 language model and decoding

28
專專專專 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu ,Cheng-Kuan Wei

Upload: lee-brennan

Post on 02-Jan-2016

82 views

Category:

Documents


0 download

DESCRIPTION

專題研究 week3 Language Model and Decoding. P rof . Lin-Shan Lee TA. Hung- Tsung Lu , C heng-Kuan Wei. Input Speech. Feature Vectors. Linguistic Decoding and Search Algorithm. Output Sentence. Front-end Signal Processing. Language Model. Acoustic Model Training. Speech - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 專題研究 week3 Language Model and Decoding

專題研究WEEK3

LANGUAGE MODEL AND DECODINGProf. Lin-Shan Lee

TA. Hung-Tsung Lu ,Cheng-Kuan Wei

Page 2: 專題研究 week3 Language Model and Decoding

語音辨識系統

Front-endSignal Processing

AcousticModels Lexicon

FeatureVectors

Linguistic Decoding and

Search Algorithm

Output Sentence

SpeechCorpora

AcousticModel

Training

LanguageModel

Construction

TextCorpora

LexicalKnowledge-base

LanguageModel

Input Speech

Grammar

Use Kaldi as tool

2

Page 3: 專題研究 week3 Language Model and Decoding

Language Modeling: providing linguistic constraints to help the selection of correct words

Prob [the computer is listening] > Prob [they come tutor is list sunny]

Prob [電腦聽聲音 ] > Prob [店老天呻吟 ]

t t

Page 4: 專題研究 week3 Language Model and Decoding

00.train_lm.sh01.format.sh

Language Model Training4

Page 5: 專題研究 week3 Language Model and Decoding

Language Model : Training Text (1/2)

train_text=ASTMIC_transcription/train.text

cut -d ' ' -f 1 --complement $train_text > /exp/lm/LM_train.text

5

remove the first column

Page 6: 專題研究 week3 Language Model and Decoding

Language Model : Training Text (2/2)

cut -d ' ' -f 1 --complement $train_text > /exp/lm/LM_train.text

Page 7: 專題研究 week3 Language Model and Decoding

Language Model : ngram-count (1/3)

/share/srilm/bin/i686-m64/ngram-count -order 2 (You can modify it from 1~3) -kndiscount (modified Kneser-Ney smoothing) -text /exp/lm/LM_train.text

(Your training data file name on p.7) -vocab $lexicon (Lexicon, as shown on p.10) -unk (Build open vocabulary language

model) -lm $lm_output (Your language model name)

http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html

Page 8: 專題研究 week3 Language Model and Decoding

Language Model : ngram-count (2/3)

Smoothing Many events never occur in the training data

e.g. Prob [Jason immediately stands up]=0 because Prob [immediately| Jason]=0

Try to assign some non-zero probabilities to all events even if they never occur in the training data.

https://class.coursera.org/nlp/lecture Week 2 – Language Modeling

Page 9: 專題研究 week3 Language Model and Decoding

Language Model : ngram-count (3/3)

Lexicon lexicon=material/lexicon.train.txt

Page 10: 專題研究 week3 Language Model and Decoding

01.format.sh

Try to replace with YOUR language model !

Page 11: 專題研究 week3 Language Model and Decoding

WFST Decoding04a.01.mono.mkgraph.sh04a.02.mono.fst.sh07a.01.tri.mkgraph.sh07a.02.tri.fst.sh

Viterbi Decoding04b.mono.viterbi.sh07b.tri.viterbi.sh

Decoding11

Page 12: 專題研究 week3 Language Model and Decoding

WFST : Introduction (1/3)

FSA (or FSM) Finite state automata / Finite state machine

An FSA “accepts” a set of strings View FSA as a representation of a possibly infinite set of

strings Start state(s) bold; final/accepting states have extra circle. This example represents the infinite set {ab, aab, aaab , . . .}

Page 13: 專題研究 week3 Language Model and Decoding

WFST : Introduction (2/3)

FSA with edges weighted

Like a normal FSA but with costs on the arcs and final-states

Note: cost comes after “/”, For final-state, “2/1” means final-cost 1 on state 2.

This example maps “ab” to (3 = 1 + 1 + 1).

Page 14: 專題研究 week3 Language Model and Decoding

WFST : Introduction (3/3)

WFST

Like a weighted FSA but with two tapes : input and output.

Ex. Input tape : “ac” Output tape : “xz” Cost = 0.5 + 2.5 + 3.5 = 6.5

Page 15: 專題研究 week3 Language Model and Decoding

WFST Composition

Notation: C = A。 B means, C is A composed with B

Page 16: 專題研究 week3 Language Model and Decoding

WFST Component

HCLG = H。 C。 L。 G H: HMM structure C: Context-dependent relabeling L: Lexicon G: language model acceptor

Page 17: 專題研究 week3 Language Model and Decoding

Framework for Speech Recognition

17

Page 18: 專題研究 week3 Language Model and Decoding

WFST Component18

L(Lexicon)

H (HMM)

G (Language Model)

Where is C ?(Context-

Dependent)

Page 19: 專題研究 week3 Language Model and Decoding

Training WFST

04a.01.mono.mkgraph.sh 07a.01.tri.mkgraph.sh

Page 20: 專題研究 week3 Language Model and Decoding

Decoding WFST (1/3)

From HCLG we have… the relationship from state word

We need another WFST, U

Compose U with HCLG, i.e. S = U。 HCLG Search the best path(s) on S is the

recognition result

20

Page 21: 專題研究 week3 Language Model and Decoding

Decoding WFST (2/3)

04a.02.mono.fst.sh 07a.02.tri.fst.sh

Page 22: 專題研究 week3 Language Model and Decoding

Decoding WFST (3/3)

During decoding, we need to specify the weight respectively for acoustic model and language model

Split the corpus to Train, Test, Dev set Training set used to training acoustic model Test all of the acoustic model weight on Dev set, and

use the best Test set used to test our performance (Word Error

Rate, WER)

22

Page 23: 專題研究 week3 Language Model and Decoding

Viterbi Decoding

Viterbi Algorithm Given acoustic model and observations Find the best state sequence

Best state sequence Phone sequence (AM) Word sequence (Lexicon) Best word sequence (LM)

Page 24: 專題研究 week3 Language Model and Decoding

Viterbi Decoding

04b.mono.viterbi.sh 07b.tri.viterbi.sh

Page 25: 專題研究 week3 Language Model and Decoding

Language model training , WFST decoding , Viterbi decoding00.train_lm.sh01.format.sh04a.01.mono.mkgraph.sh04a.02.mono.fst.sh07a.01.tri.mkgraph.sh07a.02.tri.fst.sh04b.mono.viterbi.sh07b.tri.viterbi.sh

Homework

Page 26: 專題研究 week3 Language Model and Decoding

ToDo

Step1. Finish code in 00.train_lm.sh and get your LM.

Step2. Use your LM in 01.format.sh Step3.1. Run 04a.01.mono.mkgraph.sh and

04a.02.mono.fst.sh (WFST decode for mono-phone)

Step3.2 Run 07a.01.tri.mkgraph.sh and 07a.02.tri.fst.sh (WFST decode for tri-phone) Step4.1 Run 04b.mono.viterbi.sh (Viterbi for mono) Step4.2 Run 07b.tri.viterbi.sh (Viterbi for tri-phone)

Page 27: 專題研究 week3 Language Model and Decoding

ToDo (Opt.)

Train LM : Use YOUR training text or even YOUR lexicon. Train LM (ngram-count) : Try different arguments.

http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html

Watch online courses on coursera (Week2 - LM) https://class.coursera.org/nlp/lecture

Read 數位語音處理概論 4.0 (Viterbi) 6.0 (Language Model) 9.0 (WFST)

Try different AM/LM combinations and report the recognition results.

Page 28: 專題研究 week3 Language Model and Decoding

Questions ?