pitch tracking ( 音高追蹤 ) jyh-shing roger jang ( 張智星 ) mir lab (...

Post on 16-Dec-2015

263 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Pitch Tracking ( 音高追蹤 )

Jyh-Shing Roger Jang ( 張智星 )

MIR Lab ( 多媒體資訊檢索實驗室 )

CS, NTHU ( 清華大學 資訊工程系 )

jang@mirlab.org, http://mirlab.org/jang

Pitch ( 音高)Definition of pitch

Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform

Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later)

Characteristics of pitch Noise and unvoiced sound do not have pitch.

Pitch Tracking ( 音高追蹤 ) Pitch tracking: To compute the pitch vector of a give

waveform ( 對整段音訊求取音高 ) Applications

Query by singing/humming ( 哼唱選歌 ) Tone recognition for Mandarin ( 華語的音調辨識 ) Intonation scoring for English ( 英語的音調評分 ) Prosody analysis for speech synthesis ( 語音合成中的韻律分析 )

Pitch scaling and duration modification ( 音高調節與長度改變 )

Pitch Tracking Algorithms

Two categories for pitch tracking algorithms Time domain ( 時域 )

ACF (Autocorrelation function)AMDF (Average magnitude difference function)SIFT (Simple inverse filtering tracking)

Frequency domain ( 頻域 )Harmonic product spectrum methodCepstrum method

Typical Steps for Pitch Tracking

Chop signals into frames (aka frame blocking)Compute pitch functions (ACF, AMDF, etc.)Determine pitch for a frame

Max/min picking of the pitch function

Remove unreliable pitch Via volume/clarity thresholding

Smooth the whole pitch vector Via median filter, etc.

Frame Blocking

Frame size=256 pointsOverlap=84 pointsFrame rate = fs/(frameSize-overlap) = 11025/(256-84)=64 pitch/sec

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

ACF: Auto-correlation Function

Frame s(i):

Shifted frame s(i+):

=30

30

acf(30) = inner product of overlap part

Pitch period

1

0

n

i

acf s i s i

ACF Example 1

sunday.wav Sample rate = 16kHz Frame size = 512

(starting from point 9000)

Fundamental frequency Max of ACF occurs at

index 132 FF = 16000/(132-1) =

123.077 Hz

ACF Example 2

If the range of humans’ FF is [40, 1000], then we have the restriction for selecting pitch point: Min FF=40Hz

acf(fs/40:end) is not considered.

Max FF=1000Hz acf(1:fs/1000) is not considered.

Pitch Tracking via ACF

Specs Sampe rate = 11025 Hz Frame size = 353 points

= 32 ms Overlap = 0 Frame rate = 31.25 f/s

Playback soo.wav sooPitch.wav

Variations of ACF to Avoid Tapering

Normalized version Half-frame shifting:

1

0

n

i

s i s iacf

n

/2

0

n

i

acf s i s i

Variations of ACF to Normalize Range

To normalize ACF to the range [-1 1]:

This is based on the inequality:

2 2

2 s i s insdf

s i s i

2 2 2 22x y xy x y

AMDF: Average Magnitude Difference Function

Frame s(i):

Shifted frame s(i+):

=30

30

amdf(30) = sum of abs. difference

Pitch period

1

0

n

i

amdf s i s i

AMDF Example

sunday.wav Sample rate = 16kHz Frame size = 512

(starting from point 9000)

Fundamental frequency Min of AMDF occurs at

index 132 FF = 16000/(132-1) =

123.077 Hz

Variations of AMDF to Avoid Tapering

Normalized version Half-frame shifting:

1

0

n

i

s i s iamdf

n

/2

0

n

i

amdf s i s i

Combining ACF and AMDF

ACF

AMDF

Frame

ACF/AMDF

Example of Pitch Tracking

1 2 3 4 5 6 7 8-200

-100

0

100

200soo.wav

Am

plitu

de

1 2 3 4 5 6 7 8

52

54

56

58

60

Pitc

h (s

emito

ne)

PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22

pitch1: computed pitch

18/44

UPDUDP (1/4)

UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration

: a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference

n

i

n

i

m

iiii pppamdfm1

1

11,,cost p

mn

ni ppp ,,1p

UPDUDP (2/4)

Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j)

Recurrent formula:

Initial conditions : Optimum cost :

160,8),(),1( 1 jjamdfjD

),(min

160,8jnD

j

2

160,8),1(min)(),( jkkiDjamdfjiD

ki

160,8,,1 jni

UPDUDP (3/4)

A typical example

UPDUDP (4/4)

Insensitivity in

0 0.5 1 1.5 2

-3

-2

-1

0

1

2

3

x 104

Wav

efor

m

xi

x i

lu

l u

chan

ch a nn

sheng

sh ng

chang

ch a ng

0 0.5 1 1.5 2

20

30

40

50

60

70

80

Time (seconds)

Pitc

h (S

emito

nes)

xi

x i

lu

l u

chan

ch a nn

sheng

sh ng

chang

ch a ng

=0

=2000 =4000 =6000 =8000 =10000 =12000 =14000 =16000 =18000 =20000

Harmonic Product Spectrumhps.m

Frequency to Semitone Conversion

Semitone : A music scale based on A440

Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

69440

log12 2

freqsemitone

Unreliable Pitch Removal

Pitch removal via volume thresholding

1 2 3 4 5 6 7 8

-100

-50

0

50

100

Waveform of .wav小 毛 驢

1 2 3 4 5 6 70

5000

10000

Volume

1 2 3 4 5 6 7

40

50

60

70

80

Pitch

Time (sec)

Unreliable Pitch Removal

Pitch removal via volume/clarity thresholding

1 2 3 4 5 6 7 8

-100

0

100

Waveform of .wav小 毛 驢

1 2 3 4 5 6 70

5000

10000

Volume

1 2 3 4 5 6 70

0.5

1Clarity

1 2 3 4 5 6 7

40

60

80

Pitch

Time (sec)

Rest Handling

With rests Without rests

Rest Handling

0 50 100 150 200 25055

60

65

70Original PV

0 20 40 60 80 100 120 140 160 18055

60

65

70useRest=1

0 50 100 150 200 25055

60

65

70useRest=0

Frame index

Rests are removed. Good for DTW.

Rests are replaced by previous nonzero pitch. Good for LS.

Original pitch vectors with rests.

Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音

Comparison of Pitch VectorsYellow line : Target pitch vector

Demo of Pitch Tracking

Real-time display of ACF for pitch tracking toolbox/sap/goPtByAcf.mdl

Real-time pitch tracking for real-time mic input toolbox/sap/goPtByAcf2.mdl

Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m

Intonation assessment ap170/matlab/goDemo.m

top related