techniques and applications for audio 695410106 謝育任 695410121 劉威麟

46
Techniques and Applications for Audio 695410106 謝謝謝 695410121 謝謝謝

Upload: gary-morrison

Post on 21-Jan-2016

314 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Techniques and Applications for Audio

695410106 謝育任695410121 劉威麟

Page 2: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Outline

Audio Watermark

Audio Classification : Security Monitoring Using Microphone Arrays and Audio

Classification A Generic Audio Classification and Segmentation Approach for

Multimedia Indexing and Retrieval

Page 3: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Introduction

What’s watermark? A kind of technology for data hiding

Paper watermarks appears nearly 700 years ago The oldest one be found in 1292

The idea of digital image watermarking arose independently in 1990 Around 1993, coined the word “water mark”

Page 4: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Terminology

Steganography stands for techniques in general that allow secrete communication

Watermarking , as opposed to steganography, has the additional notion of robustness against attacks

Fingerprinting and labeling are terms that denote special applications of watermarking. Ex. Copyright

Bit-stream watermarking is sometimes used for data hiding or watermarking of compressed data

Page 5: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Requirement

A watermark shall convey as much information as possible

A watermark should in general be secret and should only be accessible by authorized parties

A watermark should stay in the host data regardless of whatever happens to the host data

A watermark should be imperceptible

Depend on media to be watermarked Blend V.S Non-blend Maybe required in a real time Low complexity-time

Page 6: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Basic Watermarking Principle

There are three main issues in the design of a watermarking system

Design of the watermark signal W to be added to the host signal. Typically, the watermark signal depends on a key K and watermark information I

possibly, it may also depend on the host data X into which it is embedded

Design of the embedding method itself that incorporates the watermark signal W into the host data X yielding watermarked data Y

Page 7: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Design of the corresponding extraction method that recovers the watermark information from the signal mixture using the key and with help of the original

or without the original

Page 8: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Embedding Technologies for Audio

Low-bit coding By replacing the least significant bit of each sampling point by a coded

binary string The major disadvantage of this method is its poor immunity to manipula

tion This method is useful only in close, digital-to-digital environment

Phase coding By substituting the phase of an initial audio segment with a reference p

hase that represents the data Procedure

Break the sound sequence s[i], (0 i I-1), into a series of N short segment, ≦ ≦sn[i] where (0 n N-1)≦ ≦

Page 9: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Apply a K-points discrete Fourier transform to n-th segment, Sn[i], where (k=1/N), and create a matrix of the phase, ψn(Wk), and magnitude, An(Wk) for (0 k K-1) ≦ ≦

Page 10: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Store the phase difference between each adjacent segment for (0 n N-1)≦ ≦

A binary set of data is represented as a ψdata = π/ 2 or –π/ 2 representing 0 or 1

Page 11: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Re-create phase matrixes for n > 0 by using the phase difference

Use the modified phase matrix and the original magnitude matrix to reconstruct the sound signal by applying the inverse DFT

Page 12: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

• Spread spectrum coding

Page 13: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

In the decoding stage, the following is assumed: The pseudorandom key is maximal The key stream for the encoding is known by the receiver.

Signal synchronization is done, and the start/stop point of the spread data are known

The following parameters are known by the receiver: chip rate, data rate, and carrier frequency

To keep the noise level low and inaudible, the spread code is attenuated to roughly 0.5 percent of the dynamic range of the host sound file

Page 14: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

• Echo data hiding– The data are hidden by varying three parameters of the echo: initial

amplitude, decay rate, and offset

Page 15: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟
Page 16: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Example

Page 17: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Decoding:magnitude of the autocorrelation of the encoded signal’s cepstr

um:

Page 18: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Classification of attacks

“Simple attacks” (other possible names include “waveform attacks” and “noise attacks”) are conceptually simple attacks that attempt to impair the embedded watermark by manipulations of the whole watermarked data without an attempt to identify and isolate the watermark

“Detection-disabling attacks” (other possible names include “synchronization attacks”) are attacks that attempt to break the correlation and to make the recovery of the watermark impossible or infeasible for a watermark detector

Page 19: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Classification of attacks (continue)

“Ambiguity attacks” (other possible names include “deadlock attacks”, “inversion attacks”, “fake-watermark attacks”, and “fake-original attacks”) are attacks that attempt to confuse by producing fake original data or fake watermarked data

“Removal attacks” are attacks that attempt to analyze the watermarked data, estimate the watermark or the host data, separate the watermarked data into host data and watermark, and discard only the watermark

Page 20: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Watermark algorithm

LSB Working in time-domain and embedding the watermark in the

least significant bits The message is embedded many times into audio signal Parameters

Secrete key, error correction code, embedding message, etc

Microsoft Working in frequency domain and embedding watermark in

the frequency coefficients by using spread spectrum technique

Only one parameter: embedding message

Page 21: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

VAWW ─ Viper Audio Water Wavelet Working in wavelet domain and embedding the watermark in selected c

oefficients Parameter :

Secrete key Threshold, which selects the coefficients for embedding. The default value is

40 Scale factor, which means the embedding strength. The default value is 0.2

Publimark Open source tool Parameter:

Embedded message Public/private key

Page 22: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Reference

Multimedia Watermarking TechniqueBy Hartung, F.; Kutter, M.; PROCEEDINGS OF THE IEEE, VOL. 87, NO. 7, JULY 1999

Techniques for data hidingBy W. Bender D. Gruhl N. Morimoto A. Lu ; IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996

Transparency and Complexity Benchmarking of Audio Watermarking Algorithms IssusBy Andreas Lang, Jana Dittmann

Page 23: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Audio Classification

Audio Classification Security Multimedia Indexing and Retrieval Other

Page 24: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Introduction

The proposed system : Location Type of sound

SNR (signal to noise ratio) :

Reflection coefficient : A reflection coefficient describes either the amplitude or the intensity of a reflected wave relative to an incident wave.

Page 25: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Proposed Security monitoring instrument

Page 26: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Center Clipping

c(n) is the center clipped sample at time index n s(n) is the audio sample at time index n

Page 27: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

PR algorithm

The PR algorithm divides the audio segment into frames, estimate the presence of the human pitch in each frame, and calculates a PR parameter.

PR = NP / NF NP : the numbers of frames that have human pitch NF : the total number of frames

Page 28: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Pitch Value

Human Pitch = {Pitch : 70Hz < Pitch < 280Hz}

arg (max{ ( ) : ( ) 0.4 })xx xxPitch R R RMSE

Page 29: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Proposed system

Page 30: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Non-speech Classification

Time Delay Neural Network (TDNN) is used to classify a nonspeech audio segment into an audio Type. (e.g., door opening, fan noise…etc)

MFCC (Mel-Filtered Cepstral Coefficient) :

△ MFCC (Delta Mel-Filtered Cepstral Coefficient) :

Page 31: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Simulation Enviroment

Simulation Environment :

Page 32: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Simulation Results

OR (overlap ratio) = 0.85 .

SD ( segment duration) = 400 MS

Page 33: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Introduction

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval

Bi-model : Bit-Stream : a series of bits Generic mode : temporal and spectral information is extracted

from the PCM samples. Classification :

Speech Music Silence Fuzzy

Page 34: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Erroneous classification

Critical Errors : one pure class is misclassified into another pure class.

Semi-critical Errors : a fuzzy class type is misclassified as one of the pure class types.

Non-critical errors : a pure class is misclassified as a fuzzy class.

Page 35: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Classification and Segmentation framework

Page 36: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Spectral Template

Pulse Code Modulation : PCM is a common method of storing and transmitting uncompressed digital audio. PCM is also a very common format for AIFF and WAV files.

1 : positive voltage pulse

0 : absence of pulse

Spectral template : it formed from the input audio source, and it can be obtained from the MDCT coefficient of MP3 granules.

Power Spectrum : it obtained from the PCM samples.

Page 37: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

About MP3

Layer3 encoding process starts by dividing the audio signal into frames, which corresponds to one or two granules.

Each granules has 576 PCM samples.

There are three windowing modes in Mpeg layer3 encoding scheme : Long, Short, Mixed

Page 38: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Bit-Stream Mode

MDCT (Modified discrete cosine transform) : 2 1

0

1 1cos[ ( )( )]

2 2 2

N

k nn

Nx x n k

N

Page 39: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Bit-Stream Mode

MDCT (w, f) w represents the window number f represents the line frequency index

Page 40: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Frame Features

Total Frame Energy (TFE) Calculation : to detect silent frame

Band Energy Ratio (BER) Calculation : to detect the ratio between of two spectral regions that are separated by a single cut-off frequency.

2( ( , ) )NoW NoF

j jw f

TFE SPEQ w f

2

0

2

( ( , ))( )

( ( , ))

c

c

c

Now f f

jw f

j cNow f f

jw f f f

SPEQ w fBER f

SPEQ w f

Page 41: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Frame Features

Fundamental Frequency Estimation : if the input audio is harmonic over a fundamental frequency, the real fundamental frequency (FF) value can be estimated from the spectral coefficient (SPEQ(w,f))

Subband Centroid Frequency Estimation : Subband Centroid (SC) is the first moment of the spectral distribution.

( ( , )* ( ))

( , )

NoW NoF

w fsc NoW NoF

w f

SPEQ w f FL ff

SPEQ w f

Page 42: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Initial Classification

Page 43: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Segment Features

Transition Rate (TR) : transition between consecutive frames. TR has a forced speech classification.

Fundamental Frequency Segment Feature : FF has a forced music classification

Subband Centroid Segment Feature : SC has two forced classification region, one for music and the other for speech content.

( )2

NoF i

iNoF TP

TR SNoF

Page 44: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Step2

Page 45: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Generic Decision Table

Page 46: Techniques and Applications for Audio 695410106 謝育任 695410121 劉威麟

Step3