basic features of audio signals ( 音訊的基本特徵 )

18
Basic Features of Audio Signals ( 音音音音音音音 ) Jyh-Shing Roger Jang ( 音音音 ) http://www.cs.nthu.edu.tw/~jan g MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan

Upload: hollis

Post on 23-Jan-2016

59 views

Category:

Documents


0 download

DESCRIPTION

Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan. Audio Features. Four commonly used audio features Volume Pitch Zero crossing rate Timber Our goal - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic Features of Audio Signals ( 音訊的基本特徵 )

Basic Features of Audio Signals(音訊的基本特徵 )

Jyh-Shing Roger Jang (張智星 )http://www.cs.nthu.edu.tw/~jang

MIR Lab, CS Dept, Tsing Hua Univ.Hsinchu, Taiwan

Page 2: Basic Features of Audio Signals ( 音訊的基本特徵 )

Audio Features

Four commonly used audio features Volume Pitch Zero crossing rate Timber

Our goal These features can be perceived subjectively. But we need to compute them quantitatively for

further processing and recognition.

Page 3: Basic Features of Audio Signals ( 音訊的基本特徵 )

Audio Features in Time Domain

Audio features presented in the time domain

Intensity

Fundamental period

Timbre: Waveform within an FP

Page 4: Basic Features of Audio Signals ( 音訊的基本特徵 )

Audio Features in Frequency DomainVolume: Magnitude of spectrumPitch: Distance between harmonicsTimber: Smoothed spectrum

Second formant F2First formant

F1Pitch freq

Intensity

Page 5: Basic Features of Audio Signals ( 音訊的基本特徵 )

Demo: Real-time Spectrogram

Try “dspstfft_audio” under MATLAB:

Spectrogram:Spectrum:

Page 6: Basic Features of Audio Signals ( 音訊的基本特徵 )

Steps for Audio Feature Extraction

Frame blocking Frame duration of 20 ms or so

Feature extraction Volume, zero-crossing rate, pitch, MFCC, etc

Endpoint detection Usually based on volume & zero-crossing rate

Page 7: Basic Features of Audio Signals ( 音訊的基本特徵 )

Frame Blocking

Sample rate = 11025 HzFrame size = 256 samplesOverlap = 84 samples(Hop size = 256-84)Frame rate = 11025/(256-84)=64 frames/sec

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Page 8: Basic Features of Audio Signals ( 音訊的基本特徵 )

Intensity (I) Intensity

Visual cue: Amplitude of vibration Computation:

Volume:

Log energy (in decibel):

Characteristics Influenced by

microphone typesMicrophone setups

Perceived volume is influenced by frequency and timbre

1

n

ii

vol s

2

101

10*logn

ii

energy s

Page 9: Basic Features of Audio Signals ( 音訊的基本特徵 )

Intensity (II)To avoid DC drifting

DC drifting: The vibration is not around zero Computation:

Volume:

Log energy (in decibel):

Theoretical background (How to prove?)

1

n

ii

vol s median s

2

101

10*logn

ii

energy s mean s

1 21

, ,..., arg minn

n ix

i

s s s s s x median s

2

1 21

, ,..., arg minn

n ix

i

s s s s s x mean s

Page 10: Basic Features of Audio Signals ( 音訊的基本特徵 )

Intensity (III)

Examples Please refer to the online tutorial

Page 11: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch

Definition Pitch is known as fundamental frequency, which is

equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz).

More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:

269 12*log440

Hzsemitone

Page 12: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch Computation (I)

Pitch of tuning forks

semitoneff

pitch

Hzff

98.68440

log*1269

56.4395/7187/16000

2

Page 13: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch Computation (II)

Pitch of speech

semitoneff

pitch

Hzff

42.46440

log*1269

403.1193/75477/16000

2

Page 14: Basic Features of Audio Signals ( 音訊的基本特徵 )

Statistics of Mandarin Chinese 5401 characters, each character is at least associated with a

base syllable and a tone 411 base syllables, and most syllables have 4 ones, so we have

1501 tonal syllables Tone is characterized by the pitch curves:

Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low

Some examples of tones: 1242:清華大學 1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 ?????:美麗大教堂、滷蛋有夠鹹( Taiwanese)

Page 15: Basic Features of Audio Signals ( 音訊的基本特徵 )

Sinusoidal Signals

How to generate a stream of sinusoidal signalsfs=16000;

duration=3;

f=440;

t=(1:fs*duration)/fs;

y=0.8*sin(2*pi*f*t);

plot(t,y); axis([0.6, 0.65, -1 1]);

sound(y, fs);

Page 16: Basic Features of Audio Signals ( 音訊的基本特徵 )

Zero Crossing Rate

Zero crossing rate (ZCR) The number of zero crossing in a frame.

Characteristics: Noise and unvoiced sound have high ZCR. ZCR is commonly used in endpoint detection,

especially in detection the start and end of unvoiced sounds.

To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

Page 17: Basic Features of Audio Signals ( 音訊的基本特徵 )

ZCR ComputationsTwo types of ZCR definition

If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower.

It affects the ZCR, especially when the sample rate is low.

Other consideration Zero-justification is required. ZCR with shift can be used to distinguish between

unvoiced sounds and silence. (How to determine the shift amount?)

Page 18: Basic Features of Audio Signals ( 音訊的基本特徵 )

ZCR

Examples Please refer to the online tutorial.