powerpoint プレゼンテーション録音音声再生部 録音音声を再生 ノート編集部...

Post on 25-Jan-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2015/8/22 音声研究会@岩手県立大 4

2015/8/22 音声研究会@岩手県立大 9

10

2015/8/22 13

••

••

※ ()内数値は未知語クエリの数

…U01:

U02:

U03:

••

••

••

••

••

••

・・・・・・

・・・・・・

・・・

・・・・・・

… …

……

……

……

……

……

••

Search phase

Query term

Converting to phoneme

sequence

DTW-based term search

engine

Result

Indexing phase

Target Speech

data

ASR #1

ASR #10

ASR #2

Converting to PTN

PTN-formed index

1-best

1-best

1-best

speech utterance “Nepale” ( /N e p a a r u/ )

ASR IDOutputs of 10 ASRs

(all outputs are converted into phoneme sequence)

ASR #1 n e @ h o a @ r eASR #2 n e @ p @ a a r uASR #3 n e @ p @ a a r uASR #4 n e q p @ a a r eASR #5 n o @ b @ a @ @ NASR #6 n o @ t @ a u m eASR #7 n e N p @ a @ @ iASR #8 n e u p @ a a r eASR #9 n e @ p @ a a r e

ASR #10 n e N p @ a @ @ @

ArcNodeTerminal

Node

PTN-formed index

e

on

a

@ r

m

@

N

q

u

h

b

p

t

o

@a @

e

i

u

@

u

n

e

p

a

a

r

u

Sear

ch t

erm

CN(PTN)-based index

Total cost (distance): 0.2

NULL transition

NULL transitions

no insertion error

no insertion errors

e

on

a

@r

m

@

N

q

u

k

b

p

t

o

@a @

e

i

u

@

e

on

a

@ r

m

@

N

q

u

h

b

p

t

o

@a @

e

i

u

@

u

s o n o @ n i @ q p o N n o @ d e @ w a

s o n o @ n i @ q p o N n o @ d e @ w a

s o n o @ n i @ @ b a N n o @ u e @ w a

s o n o N n i f u p a N n u N b e e w a

s o n o @ n i @ q p o N n o @ d e @ w a

s o n o @ m i @ @ b a N n o @ u e @ w a

s o n o @ n i @ q p o N n o @ u e @ w a

s o n o @ n i @ @ b a N n o @ u e @ w a

s o n o @ n i @ @ b a N n o @ d e @ w a

s o n o @ n i @ q p o N n o @ d e @ w a

O O O O O B I I I I O O O O O O O O O O BIO tags for triphone “n-e-p”

current token

unigrams

in-ASR bigrams

in-ASR trigrams

cross-ASR bigrams

features for CRF trainingASR #1

ASR #2

ASR #3

ASR #4

ASR #5

ASR #6

ASR #7

ASR #8

ASR #9

ASR #10

phoneme-based transcriptions by ASRs

B : beginning tag of the triphoneI : inside tag of the triphone O : outside tag of the triphone

Utterance ZUtterance A

××

j-i-s: 0.50Probability of s-a-Ns: 0.9a: 0.9N: 0.9

××

s-a-N: 0.73

ASR Outputs of 10 ASR SystemsASR #1 f u j i s a N

ASR #2 f u z u sh a n

ASR #3 s i z u y a N

ASR #4 f u @ e y a m

ASR #5 k o J u g o N

ASR #6 f u J i s a N

ASR #7 k e z e ch a n

ASR #8 @ i t i s a q

ASR #9 f u r u s a N

ASR #10 s u j i h a q

B label 0.1 0.0 0.8 0.2 0.1 0.0 0.1

I label 0.1 0.4 0.2 0.7 0.9 0.9 0.1

O label 0.8 0.6 0.0 0.1 0.0 0.9 0.9

ASR Outputs of 10 ASR SystemsASR #1 f u j i s a N

ASR #2 f u z u sh a n

ASR #3 s i z u y a N

ASR #4 f u @ e y a m

ASR #5 k o J u g o N

ASR #6 f u J i s a N

ASR #7 k e z e ch a n

ASR #8 @ i t i s a q

ASR #9 f u r u s a N

ASR #10 s u j i h a q

B label 0.1 0.0 0.8 0.2 0.1 0.2 0.1

I label 0.1 0.4 0.2 0.7 0.9 0.1 0.1

O label 0.8 0.6 0.0 0.1 0.0 0.7 0.8Utterance A

ASR Outputs of 10 ASR SystemsASR #1 f u j i s a N

ASR #2 f u z u sh a n

ASR #3 s i z u y a N

ASR #4 f u @ e y a m

ASR #5 k o J u g o N

ASR #6 f u J i s a N

ASR #7 k e z e ch a n

ASR #8 @ i t i s a q

ASR #9 f u r u s a N

ASR #10 s u j i h a q

B label 0.1 0.0 0.8 0.2 0.1 0.2 0.1

I label 0.1 0.4 0.2 0.7 0.9 0.1 0.1

O label 0.8 0.6 0.0 0.1 0.0 0.7 0.8

Detection probability of query /f u j i s a N/at the utterance A by the CRF models

triphone “j-i-s” detection result by CRF

Query term/f u j i s a N/

f-u-j, u-j-i, j-i-s, …

decomposingto triphones

0.65

Probability of f-u-jf: 0.8u: 0.7j: 0.9

××

j-i-s: 0.50Probability of u-j-iu: 0.8j: 0.7i: 0.9

××

j-i-s: 0.50Probability of j-i-sj: 0.8i: 0.7s: 0.9

××

j-i-s: 0.50

10 ASR systems

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

[%

]

Recall [%]

(1) DTW (Baseline)

(2) CRF

(3) DTW+CRF

••

••

••

••

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

••

••

••

録音音声再生部録音音声を再生

ノート編集部

キャプチャ画像取得・表示部ボタンを押すとキャプチャ画像がノート編集部に出現

書き込み候補語句表示部(音声認識結果)

タッチで選択した単語が編集フィールド上に出現

キーボード・手書きでの文字入力も可能

58

top related