basic features of audio signals ( 音訊的基本特徵 )
DESCRIPTION
Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan. Audio Features. Four commonly used audio features Volume Pitch Zero crossing rate Timber Our goal - PowerPoint PPT PresentationTRANSCRIPT
Basic Features of Audio Signals(音訊的基本特徵 )
Jyh-Shing Roger Jang (張智星 )http://www.cs.nthu.edu.tw/~jang
MIR Lab, CS Dept, Tsing Hua Univ.Hsinchu, Taiwan
Audio Features
Four commonly used audio features Volume Pitch Zero crossing rate Timber
Our goal These features can be perceived subjectively. But we need to compute them quantitatively for
further processing and recognition.
Audio Features in Time Domain
Audio features presented in the time domain
Intensity
Fundamental period
Timbre: Waveform within an FP
Audio Features in Frequency DomainVolume: Magnitude of spectrumPitch: Distance between harmonicsTimber: Smoothed spectrum
Second formant F2First formant
F1Pitch freq
Intensity
Demo: Real-time Spectrogram
Try “dspstfft_audio” under MATLAB:
Spectrogram:Spectrum:
Steps for Audio Feature Extraction
Frame blocking Frame duration of 20 ms or so
Feature extraction Volume, zero-crossing rate, pitch, MFCC, etc
Endpoint detection Usually based on volume & zero-crossing rate
Frame Blocking
Sample rate = 11025 HzFrame size = 256 samplesOverlap = 84 samples(Hop size = 256-84)Frame rate = 11025/(256-84)=64 frames/sec
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Intensity (I) Intensity
Visual cue: Amplitude of vibration Computation:
Volume:
Log energy (in decibel):
Characteristics Influenced by
microphone typesMicrophone setups
Perceived volume is influenced by frequency and timbre
1
n
ii
vol s
2
101
10*logn
ii
energy s
Intensity (II)To avoid DC drifting
DC drifting: The vibration is not around zero Computation:
Volume:
Log energy (in decibel):
Theoretical background (How to prove?)
1
n
ii
vol s median s
2
101
10*logn
ii
energy s mean s
1 21
, ,..., arg minn
n ix
i
s s s s s x median s
2
1 21
, ,..., arg minn
n ix
i
s s s s s x mean s
Intensity (III)
Examples Please refer to the online tutorial
Pitch
Definition Pitch is known as fundamental frequency, which is
equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz).
More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:
269 12*log440
Hzsemitone
Pitch Computation (I)
Pitch of tuning forks
semitoneff
pitch
Hzff
98.68440
log*1269
56.4395/7187/16000
2
Pitch Computation (II)
Pitch of speech
semitoneff
pitch
Hzff
42.46440
log*1269
403.1193/75477/16000
2
Statistics of Mandarin Chinese 5401 characters, each character is at least associated with a
base syllable and a tone 411 base syllables, and most syllables have 4 ones, so we have
1501 tonal syllables Tone is characterized by the pitch curves:
Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low
Some examples of tones: 1242:清華大學 1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 ?????:美麗大教堂、滷蛋有夠鹹( Taiwanese)
Sinusoidal Signals
How to generate a stream of sinusoidal signalsfs=16000;
duration=3;
f=440;
t=(1:fs*duration)/fs;
y=0.8*sin(2*pi*f*t);
plot(t,y); axis([0.6, 0.65, -1 1]);
sound(y, fs);
Zero Crossing Rate
Zero crossing rate (ZCR) The number of zero crossing in a frame.
Characteristics: Noise and unvoiced sound have high ZCR. ZCR is commonly used in endpoint detection,
especially in detection the start and end of unvoiced sounds.
To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.
ZCR ComputationsTwo types of ZCR definition
If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower.
It affects the ZCR, especially when the sample rate is low.
Other consideration Zero-justification is required. ZCR with shift can be used to distinguish between
unvoiced sounds and silence. (How to determine the shift amount?)
ZCR
Examples Please refer to the online tutorial.