basic features of audio signals ( 音訊的基本特徵 )

Basic Features of Audio Signals(音訊的基本特徵 )

Jyh-Shing Roger Jang (張智星 )http://www.cs.nthu.edu.tw/~jang

MIR Lab, CS Dept, Tsing Hua Univ.Hsinchu, Taiwan

http://www.cs.nthu.edu.tw/~jang

Audio Features

Four commonly used audio features Volume Pitch Zero crossing rate Timber

Our goal These features can be perceived subjectively. But we need to compute them quantitatively for

further processing and recognition.

Audio Features in Time Domain

Audio features presented in the time domain

Intensity

Fundamental period

Timbre: Waveform within an FP

Audio Features in Frequency DomainVolume: Magnitude of spectrumPitch: Distance between harmonicsTimber: Smoothed spectrum

Second formant F2First formant

F1Pitch freq

Intensity

Demo: Real-time Spectrogram

Try “dspstfft_audio” under MATLAB:

Spectrogram:Spectrum:

Steps for Audio Feature Extraction

Frame blocking Frame duration of 20 ms or so

Feature extraction Volume, zero-crossing rate, pitch, MFCC, etc

Endpoint detection Usually based on volume & zero-crossing rate

Frame Blocking

Sample rate = 11025 HzFrame size = 256 samplesOverlap = 84 samples(Hop size = 256-84)Frame rate = 11025/(256-84)=64 frames/sec

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Intensity (I) Intensity

Visual cue: Amplitude of vibration Computation:

Volume:

Log energy (in decibel):

Characteristics Influenced by

microphone typesMicrophone setups

Perceived volume is influenced by frequency and timbre

1

n

ii

vol s

2

101

10*logn

ii

energy s

Intensity (II)To avoid DC drifting

DC drifting: The vibration is not around zero Computation:

Volume:

Log energy (in decibel):

Theoretical background (How to prove?)

1

n

ii

vol s median s

2

101

10*logn

ii

energy s mean s

1 21

, ,..., arg minn

n ix

i

s s s s s x median s

2

1 21

, ,..., arg minn

n ix

i

s s s s s x mean s

Intensity (III)

Examples Please refer to the online tutorial

Pitch

Definition Pitch is known as fundamental frequency, which is

equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz).

More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:

269 12*log440

Hzsemitone

Pitch Computation (I)

Pitch of tuning forks

semitoneff

pitch

Hzff

98.68440

log*1269

56.4395/7187/16000

2

Pitch Computation (II)

Pitch of speech

semitoneff

pitch

Hzff

42.46440

log*1269

403.1193/75477/16000

2

Statistics of Mandarin Chinese 5401 characters, each character is at least associated with a

base syllable and a tone 411 base syllables, and most syllables have 4 ones, so we have

1501 tonal syllables Tone is characterized by the pitch curves:

Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low

Some examples of tones: 1242：清華大學 1234：三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 ?????：美麗大教堂、滷蛋有夠鹹（ Taiwanese）

Sinusoidal Signals

How to generate a stream of sinusoidal signalsfs=16000;

duration=3;

f=440;

t=(1:fs*duration)/fs;

y=0.8*sin(2*pi*f*t);

plot(t,y); axis([0.6, 0.65, -1 1]);

sound(y, fs);

Zero Crossing Rate

Zero crossing rate (ZCR) The number of zero crossing in a frame.

Characteristics： Noise and unvoiced sound have high ZCR. ZCR is commonly used in endpoint detection,

especially in detection the start and end of unvoiced sounds.

To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

ZCR ComputationsTwo types of ZCR definition

If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower.

It affects the ZCR, especially when the sample rate is low.

Other consideration Zero-justification is required. ZCR with shift can be used to distinguish between

unvoiced sounds and silence. (How to determine the shift amount?)

ZCR

Examples Please refer to the online tutorial.

basic features of audio signals ( 音訊的基本特徵 )

Documents