endpoint detection ( 端點偵測 ) jyh-shing roger jang ( 張智星 ) mir lab, csie dept national...

23
Endpoint Detection ( 端端端端 ) Jyh-Shing Roger Jang ( 端端端 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

Upload: dana-black

Post on 29-Dec-2015

328 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

Endpoint Detection( 端點偵測 )

Jyh-Shing Roger Jang (張智星 )http://mirlab.org/jangMIR Lab, CSIE Dept

National Taiwan Univ., Taiwan

Page 2: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-2-

Intro to Endpoint DetectionEndpoint detection (EPD, 端點偵測 )

Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD)

Importance Acts as a preprocessing step for many recognition tasks Requires as small computing power as possible

Two activation modes for speech-base applications Push to talk once Offline EPD

Example: voice command Push for continuously listening Online EPD

Example: Dictation machine

Quiz candidate!

Page 3: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-3-

Types of Features for EPDTime-domain

Volume only Volume and ZCR (zero

crossing rate) Volume and HOD (high-

order difference) …

Frequency-domain Variance of spectrum Entropy of spectrum MFCC …

Page 4: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-4-

Typical Frameworks to EPDThresholding

Simple thresholdingCompute a feature (e.g.,

volume) from each frameSelect a threshold vth to

identify positive frames Combined thresholding

Use two features (e.g., volume and ZCR) to make decision

Static classification Take features Perform binary

classificationNegativesil or noisePositivesound activity

Sequence alignment Use hidden Markov

models (HMM) for sequence alignment

Page 5: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-5-

Performance Evaluation for EPD

Two types of errors (typical for all binary classification) False negative (aka false

rejection)positive negative

False positive (aka false acceptance)

negative positive

Performance evaluation Start & end position

accuracy Frame-based accuracy

Quiz candidate!

Page 6: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-6-

EPD by Volume Thresholding

The simplest method for EPD Volume is based on abs sum of frames.

Four intuitive way to select vth: vth = vmax* vth = vmedian* vth = vmin* vth = v1*

Page 7: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-7-

How Do They Fail?

Unfortunately… All the thresholds fail one way or another. Under what situations do they fail?

vth = vmax*Plosive soundsvth = vmedian*Silence too longvth = vmin*Total-zero framevth = v1*Unstable frame

We need a a better strategy…

Page 8: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-8-

A Better Strategy for Threshold Finding

A presumably better way to select vth

vlower = 3rd percentile of volumes vupper = 97th percentile of volumes vth = (vupper-vlower)*+vlower

Why do we need to use percentile? To deal with plosive sounds To deal total-zero frames

Does it fail? Yes, still, in certain situation…

Page 9: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-9-

Example: EPD by Volume

epdByVol01.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1Waveform and EP (method=vol)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Vo

lum

e

0

50

100

Volume

Play all Play detected

Page 10: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-10-

How to Enhance EPD by Volume?

Major problem of EPD by volume Threshold is hard to determine

Corpus-based fine-tuning

Unvoiced parts are likely to be ignoredWe need a features to enhance the unvoiced partsThis can be achieved by ZCR or HOD

Page 11: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-11-

ZCR for Unvoiced Sound Detection

ZCR: zero crossing rate No. of zero crossing in a frame zvoiced ≤ zsilence ≤ zunvoiced

Example: epdShowZcr01.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1SingaporeIsAFinePlace.wav

Time (sec)

0.5 1 1.5 2

Cou

nt

0

50

100

150

200ZCR

Play Wave

Quiz:If frame=[-1 2 -2 3 5 2 -2 1],what is its ZCR?

Quiz candidate!

Page 12: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-12-

EPD by Volume and ZCR

1. Determine initial endpoints by u

2. Expand the initial endpoints based on l

3. Further expand the endpoints based on ZCR threshold zc

Page 13: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-13-

Example: EPD by Volume and ZCR

epdByVolZcr01.m

0.5 1 1.5 2Am

plit

ude

-1

0

1Waveform and EP (method=volZcr)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Vol

ume

2060

100

Volume

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

ZC

R

0

50

Zero crossing rate

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Am

plit

ude

-1

0

1Waveform after EPD

Play all Play detected

Page 14: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-14-

EPD by Volume and HOD

Another feature to enhance unvoiced sounds: High order difference

Order-1 HOD = sum(abs(diff(s)))Order-2 HOD = sum(abs(diff(diff(s))))Order-3 HOD = sum(abs(diff(diff(diff(s)))))…

Quiz:If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-1 HOD?

Page 15: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-15-

Example: Plots of Volume and HOD

highOrderDiff01.m

0 0.5 1 1.5 2 2.5

Am

plitu

de

-1

-0.5

0

0.5

1Waveform

Time (sec)

0 0.5 1 1.5 2 2.50

50

100

VolumeOrder-1 diff

Order-2 diff

Order-3 diffOrder-4 diff

Page 16: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-16-

Example: EPD by Vol. and HOD

epdByVolHod01.m

0.5 1 1.5 2

Am

plitu

de

-1

0

1Waveform and EP (method=volHod)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Vo

lum

e &

HO

D

0.5

1Volume & HOD

Volume

HOD

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

VH

0

0.5

VH

Play all Play detected

Page 17: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-17-

Hard Example: EPD by Vol. and HOD

A hard example: epdByVolHod02.m

1 2 3 4 5 6

Am

plitu

de

-1

0

1Waveform and EP (method=volHod)

1 2 3 4 5 6

Vo

lum

e &

HO

D

0.5

1Volume & HOD

Volume

HOD

1 2 3 4 5 6

VH

0

0.5

VH

Play all Play detected

Page 18: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-18-

EPD by Spectrum

epdShowSpec01.m epdShowSpec02.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1SingaporeIsAFinePlace.wav

Time (sec)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Fre

q (H

z)

0

2000

4000

6000

8000

Play Wave

1 2 3 4 5 6

Am

plitu

de

-1

-0.5

0

0.5

1noisy4epd.wav

Time (sec)

1 2 3 4 5 6

Fre

q (H

z)

0

2000

4000

Play Wave

Page 19: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-19-

How to Aggregate Spectrum?

How to aggregate spectrum as a single feature which is larger (or smaller) when the spectral energy distribution is diversified? Entropy function Geometric mean over arithmetic mean

Page 20: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-20-

Entropy Function

Entropy function

Property

Proof…

n

iii

n

iiin

pppentropy

pppppp

1

121

ln)(

1 and i,0,,...,

./1... when maximum its achieves )( 21 nppppentropy n

Quiz candidate!

Page 21: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-21-

Plots of Entropy Function

N=2

entropyPlot.m

N=3

Page 22: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-22-

Spectral Entropy

PDF: Normalization

Spectral entropy:

Nifs

fsp N

kk

ii ,...,1,

)(

)(

1

HzforHzfiffs iii 60002500)(

120 iii porpifp

N

kkk ppH

1

log

Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

Page 23: Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 )  MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-23-

Geometric/Arithmetic Means

Arithmetic & Geometric means

Property

Proof…

n

ii

n

ii

in

ppgmppam

ippppp

)(,)(

,0 and ,..., 21

nppppam

pgmpgmpam ... when maximum its achieves

)(

)()()( 21

Quiz candidate!