pitch tracking ( 音高追蹤 ) jyh-shing roger jang ( 張智星 ) mir lab (...

30
Pitch Tracking ( 音音音音 ) Jyh-Shing Roger Jang ( 音音音 ) MIR Lab ( 音音 音音音音音音音 ) CS, NTHU ( 音音音音 音音音音音 ) [email protected] , http://mirlab.org/jang

Upload: junior-woods

Post on 16-Dec-2015

263 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Pitch Tracking ( 音高追蹤 )

Jyh-Shing Roger Jang ( 張智星 )

MIR Lab ( 多媒體資訊檢索實驗室 )

CS, NTHU ( 清華大學 資訊工程系 )

[email protected], http://mirlab.org/jang

Page 2: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Pitch ( 音高)Definition of pitch

Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform

Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later)

Characteristics of pitch Noise and unvoiced sound do not have pitch.

Page 3: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Pitch Tracking ( 音高追蹤 ) Pitch tracking: To compute the pitch vector of a give

waveform ( 對整段音訊求取音高 ) Applications

Query by singing/humming ( 哼唱選歌 ) Tone recognition for Mandarin ( 華語的音調辨識 ) Intonation scoring for English ( 英語的音調評分 ) Prosody analysis for speech synthesis ( 語音合成中的韻律分析 )

Pitch scaling and duration modification ( 音高調節與長度改變 )

Page 4: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Pitch Tracking Algorithms

Two categories for pitch tracking algorithms Time domain ( 時域 )

ACF (Autocorrelation function)AMDF (Average magnitude difference function)SIFT (Simple inverse filtering tracking)

Frequency domain ( 頻域 )Harmonic product spectrum methodCepstrum method

Page 5: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Typical Steps for Pitch Tracking

Chop signals into frames (aka frame blocking)Compute pitch functions (ACF, AMDF, etc.)Determine pitch for a frame

Max/min picking of the pitch function

Remove unreliable pitch Via volume/clarity thresholding

Smooth the whole pitch vector Via median filter, etc.

Page 6: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Frame Blocking

Frame size=256 pointsOverlap=84 pointsFrame rate = fs/(frameSize-overlap) = 11025/(256-84)=64 pitch/sec

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Page 7: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

ACF: Auto-correlation Function

Frame s(i):

Shifted frame s(i+):

=30

30

acf(30) = inner product of overlap part

Pitch period

1

0

n

i

acf s i s i

Page 8: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

ACF Example 1

sunday.wav Sample rate = 16kHz Frame size = 512

(starting from point 9000)

Fundamental frequency Max of ACF occurs at

index 132 FF = 16000/(132-1) =

123.077 Hz

Page 9: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

ACF Example 2

If the range of humans’ FF is [40, 1000], then we have the restriction for selecting pitch point: Min FF=40Hz

acf(fs/40:end) is not considered.

Max FF=1000Hz acf(1:fs/1000) is not considered.

Page 10: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Pitch Tracking via ACF

Specs Sampe rate = 11025 Hz Frame size = 353 points

= 32 ms Overlap = 0 Frame rate = 31.25 f/s

Playback soo.wav sooPitch.wav

Page 11: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Variations of ACF to Avoid Tapering

Normalized version Half-frame shifting:

1

0

n

i

s i s iacf

n

/2

0

n

i

acf s i s i

Page 12: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Variations of ACF to Normalize Range

To normalize ACF to the range [-1 1]:

This is based on the inequality:

2 2

2 s i s insdf

s i s i

2 2 2 22x y xy x y

Page 13: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

AMDF: Average Magnitude Difference Function

Frame s(i):

Shifted frame s(i+):

=30

30

amdf(30) = sum of abs. difference

Pitch period

1

0

n

i

amdf s i s i

Page 14: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

AMDF Example

sunday.wav Sample rate = 16kHz Frame size = 512

(starting from point 9000)

Fundamental frequency Min of AMDF occurs at

index 132 FF = 16000/(132-1) =

123.077 Hz

Page 15: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Variations of AMDF to Avoid Tapering

Normalized version Half-frame shifting:

1

0

n

i

s i s iamdf

n

/2

0

n

i

amdf s i s i

Page 16: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Combining ACF and AMDF

ACF

AMDF

Frame

ACF/AMDF

Page 17: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Example of Pitch Tracking

1 2 3 4 5 6 7 8-200

-100

0

100

200soo.wav

Am

plitu

de

1 2 3 4 5 6 7 8

52

54

56

58

60

Pitc

h (s

emito

ne)

PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22

pitch1: computed pitch

Page 18: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

18/44

UPDUDP (1/4)

UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration

: a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference

n

i

n

i

m

iiii pppamdfm1

1

11,,cost p

mn

ni ppp ,,1p

Page 19: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

UPDUDP (2/4)

Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j)

Recurrent formula:

Initial conditions : Optimum cost :

160,8),(),1( 1 jjamdfjD

),(min

160,8jnD

j

2

160,8),1(min)(),( jkkiDjamdfjiD

ki

160,8,,1 jni

Page 20: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

UPDUDP (3/4)

A typical example

Page 21: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

UPDUDP (4/4)

Insensitivity in

0 0.5 1 1.5 2

-3

-2

-1

0

1

2

3

x 104

Wav

efor

m

xi

x i

lu

l u

chan

ch a nn

sheng

sh ng

chang

ch a ng

0 0.5 1 1.5 2

20

30

40

50

60

70

80

Time (seconds)

Pitc

h (S

emito

nes)

xi

x i

lu

l u

chan

ch a nn

sheng

sh ng

chang

ch a ng

=0

=2000 =4000 =6000 =8000 =10000 =12000 =14000 =16000 =18000 =20000

Page 22: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Harmonic Product Spectrumhps.m

Page 23: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Frequency to Semitone Conversion

Semitone : A music scale based on A440

Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

69440

log12 2

freqsemitone

Page 24: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Unreliable Pitch Removal

Pitch removal via volume thresholding

1 2 3 4 5 6 7 8

-100

-50

0

50

100

Waveform of .wav小 毛 驢

1 2 3 4 5 6 70

5000

10000

Volume

1 2 3 4 5 6 7

40

50

60

70

80

Pitch

Time (sec)

Page 25: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Unreliable Pitch Removal

Pitch removal via volume/clarity thresholding

1 2 3 4 5 6 7 8

-100

0

100

Waveform of .wav小 毛 驢

1 2 3 4 5 6 70

5000

10000

Volume

1 2 3 4 5 6 70

0.5

1Clarity

1 2 3 4 5 6 7

40

60

80

Pitch

Time (sec)

Page 26: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Rest Handling

With rests Without rests

Page 27: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Rest Handling

0 50 100 150 200 25055

60

65

70Original PV

0 20 40 60 80 100 120 140 160 18055

60

65

70useRest=1

0 50 100 150 200 25055

60

65

70useRest=0

Frame index

Rests are removed. Good for DTW.

Rests are replaced by previous nonzero pitch. Good for LS.

Original pitch vectors with rests.

Page 28: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音

Page 29: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Comparison of Pitch VectorsYellow line : Target pitch vector

Page 30: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,

Demo of Pitch Tracking

Real-time display of ACF for pitch tracking toolbox/sap/goPtByAcf.mdl

Real-time pitch tracking for real-time mic input toolbox/sap/goPtByAcf2.mdl

Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m

Intonation assessment ap170/matlab/goDemo.m