pitch tracking ( 音高追蹤 ) jyh-shing roger jang ( 張智星 ) mir lab (...
TRANSCRIPT
Pitch Tracking ( 音高追蹤 )
Jyh-Shing Roger Jang ( 張智星 )
MIR Lab ( 多媒體資訊檢索實驗室 )
CS, NTHU ( 清華大學 資訊工程系 )
[email protected], http://mirlab.org/jang
Pitch ( 音高)Definition of pitch
Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform
Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later)
Characteristics of pitch Noise and unvoiced sound do not have pitch.
Pitch Tracking ( 音高追蹤 ) Pitch tracking: To compute the pitch vector of a give
waveform ( 對整段音訊求取音高 ) Applications
Query by singing/humming ( 哼唱選歌 ) Tone recognition for Mandarin ( 華語的音調辨識 ) Intonation scoring for English ( 英語的音調評分 ) Prosody analysis for speech synthesis ( 語音合成中的韻律分析 )
Pitch scaling and duration modification ( 音高調節與長度改變 )
Pitch Tracking Algorithms
Two categories for pitch tracking algorithms Time domain ( 時域 )
ACF (Autocorrelation function)AMDF (Average magnitude difference function)SIFT (Simple inverse filtering tracking)
Frequency domain ( 頻域 )Harmonic product spectrum methodCepstrum method
Typical Steps for Pitch Tracking
Chop signals into frames (aka frame blocking)Compute pitch functions (ACF, AMDF, etc.)Determine pitch for a frame
Max/min picking of the pitch function
Remove unreliable pitch Via volume/clarity thresholding
Smooth the whole pitch vector Via median filter, etc.
Frame Blocking
Frame size=256 pointsOverlap=84 pointsFrame rate = fs/(frameSize-overlap) = 11025/(256-84)=64 pitch/sec
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
ACF: Auto-correlation Function
Frame s(i):
Shifted frame s(i+):
=30
30
acf(30) = inner product of overlap part
Pitch period
1
0
n
i
acf s i s i
ACF Example 1
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Max of ACF occurs at
index 132 FF = 16000/(132-1) =
123.077 Hz
ACF Example 2
If the range of humans’ FF is [40, 1000], then we have the restriction for selecting pitch point: Min FF=40Hz
acf(fs/40:end) is not considered.
Max FF=1000Hz acf(1:fs/1000) is not considered.
Pitch Tracking via ACF
Specs Sampe rate = 11025 Hz Frame size = 353 points
= 32 ms Overlap = 0 Frame rate = 31.25 f/s
Playback soo.wav sooPitch.wav
Variations of ACF to Avoid Tapering
Normalized version Half-frame shifting:
1
0
n
i
s i s iacf
n
/2
0
n
i
acf s i s i
Variations of ACF to Normalize Range
To normalize ACF to the range [-1 1]:
This is based on the inequality:
2 2
2 s i s insdf
s i s i
2 2 2 22x y xy x y
AMDF: Average Magnitude Difference Function
Frame s(i):
Shifted frame s(i+):
=30
30
amdf(30) = sum of abs. difference
Pitch period
1
0
n
i
amdf s i s i
AMDF Example
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Min of AMDF occurs at
index 132 FF = 16000/(132-1) =
123.077 Hz
Variations of AMDF to Avoid Tapering
Normalized version Half-frame shifting:
1
0
n
i
s i s iamdf
n
/2
0
n
i
amdf s i s i
Combining ACF and AMDF
ACF
AMDF
Frame
ACF/AMDF
Example of Pitch Tracking
1 2 3 4 5 6 7 8-200
-100
0
100
200soo.wav
Am
plitu
de
1 2 3 4 5 6 7 8
52
54
56
58
60
Pitc
h (s
emito
ne)
PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22
pitch1: computed pitch
18/44
UPDUDP (1/4)
UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration
: a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference
n
i
n
i
m
iiii pppamdfm1
1
11,,cost p
mn
ni ppp ,,1p
UPDUDP (2/4)
Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j)
Recurrent formula:
Initial conditions : Optimum cost :
160,8),(),1( 1 jjamdfjD
),(min
160,8jnD
j
2
160,8),1(min)(),( jkkiDjamdfjiD
ki
160,8,,1 jni
UPDUDP (3/4)
A typical example
UPDUDP (4/4)
Insensitivity in
0 0.5 1 1.5 2
-3
-2
-1
0
1
2
3
x 104
Wav
efor
m
xi
x i
lu
l u
chan
ch a nn
sheng
sh ng
chang
ch a ng
0 0.5 1 1.5 2
20
30
40
50
60
70
80
Time (seconds)
Pitc
h (S
emito
nes)
xi
x i
lu
l u
chan
ch a nn
sheng
sh ng
chang
ch a ng
=0
=2000 =4000 =6000 =8000 =10000 =12000 =14000 =16000 =18000 =20000
Harmonic Product Spectrumhps.m
Frequency to Semitone Conversion
Semitone : A music scale based on A440
Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )
69440
log12 2
freqsemitone
Unreliable Pitch Removal
Pitch removal via volume thresholding
1 2 3 4 5 6 7 8
-100
-50
0
50
100
Waveform of .wav小 毛 驢
1 2 3 4 5 6 70
5000
10000
Volume
1 2 3 4 5 6 7
40
50
60
70
80
Pitch
Time (sec)
Unreliable Pitch Removal
Pitch removal via volume/clarity thresholding
1 2 3 4 5 6 7 8
-100
0
100
Waveform of .wav小 毛 驢
1 2 3 4 5 6 70
5000
10000
Volume
1 2 3 4 5 6 70
0.5
1Clarity
1 2 3 4 5 6 7
40
60
80
Pitch
Time (sec)
Rest Handling
With rests Without rests
Rest Handling
0 50 100 150 200 25055
60
65
70Original PV
0 20 40 60 80 100 120 140 160 18055
60
65
70useRest=1
0 50 100 150 200 25055
60
65
70useRest=0
Frame index
Rests are removed. Good for DTW.
Rests are replaced by previous nonzero pitch. Good for LS.
Original pitch vectors with rests.
Typical Result of Pitch Tracking
Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音
Comparison of Pitch VectorsYellow line : Target pitch vector
Demo of Pitch Tracking
Real-time display of ACF for pitch tracking toolbox/sap/goPtByAcf.mdl
Real-time pitch tracking for real-time mic input toolbox/sap/goPtByAcf2.mdl
Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m
Intonation assessment ap170/matlab/goDemo.m