endpoint detection ( 端點偵測 ) jyh-shing roger jang ( 張智星 ) mir lab, csie dept national...

Post on 29-Dec-2015

328 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Endpoint Detection( 端點偵測 )

Jyh-Shing Roger Jang (張智星 )http://mirlab.org/jangMIR Lab, CSIE Dept

National Taiwan Univ., Taiwan

-2-

Intro to Endpoint DetectionEndpoint detection (EPD, 端點偵測 )

Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD)

Importance Acts as a preprocessing step for many recognition tasks Requires as small computing power as possible

Two activation modes for speech-base applications Push to talk once Offline EPD

Example: voice command Push for continuously listening Online EPD

Example: Dictation machine

Quiz candidate!

-3-

Types of Features for EPDTime-domain

Volume only Volume and ZCR (zero

crossing rate) Volume and HOD (high-

order difference) …

Frequency-domain Variance of spectrum Entropy of spectrum MFCC …

-4-

Typical Frameworks to EPDThresholding

Simple thresholdingCompute a feature (e.g.,

volume) from each frameSelect a threshold vth to

identify positive frames Combined thresholding

Use two features (e.g., volume and ZCR) to make decision

Static classification Take features Perform binary

classificationNegativesil or noisePositivesound activity

Sequence alignment Use hidden Markov

models (HMM) for sequence alignment

-5-

Performance Evaluation for EPD

Two types of errors (typical for all binary classification) False negative (aka false

rejection)positive negative

False positive (aka false acceptance)

negative positive

Performance evaluation Start & end position

accuracy Frame-based accuracy

Quiz candidate!

-6-

EPD by Volume Thresholding

The simplest method for EPD Volume is based on abs sum of frames.

Four intuitive way to select vth: vth = vmax* vth = vmedian* vth = vmin* vth = v1*

-7-

How Do They Fail?

Unfortunately… All the thresholds fail one way or another. Under what situations do they fail?

vth = vmax*Plosive soundsvth = vmedian*Silence too longvth = vmin*Total-zero framevth = v1*Unstable frame

We need a a better strategy…

-8-

A Better Strategy for Threshold Finding

A presumably better way to select vth

vlower = 3rd percentile of volumes vupper = 97th percentile of volumes vth = (vupper-vlower)*+vlower

Why do we need to use percentile? To deal with plosive sounds To deal total-zero frames

Does it fail? Yes, still, in certain situation…

-9-

Example: EPD by Volume

epdByVol01.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1Waveform and EP (method=vol)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Vo

lum

e

0

50

100

Volume

Play all Play detected

-10-

How to Enhance EPD by Volume?

Major problem of EPD by volume Threshold is hard to determine

Corpus-based fine-tuning

Unvoiced parts are likely to be ignoredWe need a features to enhance the unvoiced partsThis can be achieved by ZCR or HOD

-11-

ZCR for Unvoiced Sound Detection

ZCR: zero crossing rate No. of zero crossing in a frame zvoiced ≤ zsilence ≤ zunvoiced

Example: epdShowZcr01.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1SingaporeIsAFinePlace.wav

Time (sec)

0.5 1 1.5 2

Cou

nt

0

50

100

150

200ZCR

Play Wave

Quiz:If frame=[-1 2 -2 3 5 2 -2 1],what is its ZCR?

Quiz candidate!

-12-

EPD by Volume and ZCR

1. Determine initial endpoints by u

2. Expand the initial endpoints based on l

3. Further expand the endpoints based on ZCR threshold zc

-13-

Example: EPD by Volume and ZCR

epdByVolZcr01.m

0.5 1 1.5 2Am

plit

ude

-1

0

1Waveform and EP (method=volZcr)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Vol

ume

2060

100

Volume

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

ZC

R

0

50

Zero crossing rate

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Am

plit

ude

-1

0

1Waveform after EPD

Play all Play detected

-14-

EPD by Volume and HOD

Another feature to enhance unvoiced sounds: High order difference

Order-1 HOD = sum(abs(diff(s)))Order-2 HOD = sum(abs(diff(diff(s))))Order-3 HOD = sum(abs(diff(diff(diff(s)))))…

Quiz:If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-1 HOD?

-15-

Example: Plots of Volume and HOD

highOrderDiff01.m

0 0.5 1 1.5 2 2.5

Am

plitu

de

-1

-0.5

0

0.5

1Waveform

Time (sec)

0 0.5 1 1.5 2 2.50

50

100

VolumeOrder-1 diff

Order-2 diff

Order-3 diffOrder-4 diff

-16-

Example: EPD by Vol. and HOD

epdByVolHod01.m

0.5 1 1.5 2

Am

plitu

de

-1

0

1Waveform and EP (method=volHod)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Vo

lum

e &

HO

D

0.5

1Volume & HOD

Volume

HOD

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

VH

0

0.5

VH

Play all Play detected

-17-

Hard Example: EPD by Vol. and HOD

A hard example: epdByVolHod02.m

1 2 3 4 5 6

Am

plitu

de

-1

0

1Waveform and EP (method=volHod)

1 2 3 4 5 6

Vo

lum

e &

HO

D

0.5

1Volume & HOD

Volume

HOD

1 2 3 4 5 6

VH

0

0.5

VH

Play all Play detected

-18-

EPD by Spectrum

epdShowSpec01.m epdShowSpec02.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1SingaporeIsAFinePlace.wav

Time (sec)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Fre

q (H

z)

0

2000

4000

6000

8000

Play Wave

1 2 3 4 5 6

Am

plitu

de

-1

-0.5

0

0.5

1noisy4epd.wav

Time (sec)

1 2 3 4 5 6

Fre

q (H

z)

0

2000

4000

Play Wave

-19-

How to Aggregate Spectrum?

How to aggregate spectrum as a single feature which is larger (or smaller) when the spectral energy distribution is diversified? Entropy function Geometric mean over arithmetic mean

-20-

Entropy Function

Entropy function

Property

Proof…

n

iii

n

iiin

pppentropy

pppppp

1

121

ln)(

1 and i,0,,...,

./1... when maximum its achieves )( 21 nppppentropy n

Quiz candidate!

-21-

Plots of Entropy Function

N=2

entropyPlot.m

N=3

-22-

Spectral Entropy

PDF: Normalization

Spectral entropy:

Nifs

fsp N

kk

ii ,...,1,

)(

)(

1

HzforHzfiffs iii 60002500)(

120 iii porpifp

N

kkk ppH

1

log

Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

-23-

Geometric/Arithmetic Means

Arithmetic & Geometric means

Property

Proof…

n

ii

n

ii

in

ppgmppam

ippppp

)(,)(

,0 and ,..., 21

nppppam

pgmpgmpam ... when maximum its achieves

)(

)()()( 21

Quiz candidate!

top related