machine learning: basis and wavelet - wordpress.com€¦ · machine learning: basis and wavelet...

Post on 08-May-2020

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning:Basis and Wavelet

Haar DWT in 2 levels

7 22 38 191

17 83 188 211

71 167 194 207

159 187 201 216

20 44

31 7

135

40 -17

46

13 -32

-17 1

18 42

27 4

32 157

146 204

김 화 평 (CSE )Medical Image computing lab 서진근교수 연구실

+ +

- -

+ -

+ -

+ -

- +

+ +

+ +

Machine learning is the field of study that gives computers the ability to learn the feed-forward function without being explicitly programmed.

Mission: Find a feed-forward function from labeled training data, , : , … , , such that , , … , .

Supervised learning is the machine learning technique of finding a feed-forward function iteratively from labeled training data, , : 1, … , , such that , 1, … , .

Machine learning: Why it is and why it matters.

Humans can typically create one or two good models a week; machine learning can create thousands of models a week

Basis: Fourier TransformBasis

The Fourier transform of is defined by .

Each fourier transform acts as a basis to demonstrate the ability to distinguish different signals.

Every function can be expressed as a linear combination of basis functions ∑ ,

where , , ⋯ is a set of orthonormal basis , 1 ,0 .

Approximation by 4 principal components (basis) only

Slide Credit: Vaclav

, ⋆ and , , ⋆Wavelet coefficients: ⋆⋆ , ,

: average: higher frequencies

, , ,

,

,

,

,

What is wavelet?

Why wavelets? • Wavelets are uniformly stable to deformations.

• Wavelets separate multiscale information.

• Wavelets provide sparse representations.

Scattering convolution networkFor appropriate wavelets, such a dreamlike kernel Φ can be represented by

scattering coefficients using wavelet transform.

Review on Wavelet

Haar DWT in 2 levels

7 22 38 191

17 83 188 211

71 167 194 207

159 187 201 216

20 44

31 7

135

40 -17

46

13 -32

-17 1

18 42

27 4

32 157

146 204

계산과학공학과 통합과정 김화평Medical Image computing lab 서진근교수 연구실

+ +

- -

+ -

+ -

+ -

- +

+ +

+ +

, ≔ / for .

Wavelet basis functions: The family of functions , : , ∈ , dyadic translations and dilations of a mother wavelet function , construct a complete orthonormal Hilbert basis.

, ,

where ,

, , .,

, 2 / 2 , 2 / 2 1

, 2 2 1 , 2 2 2 , 2 2 3 , 2 2 4

Discrete Haar wavelet Transform

Approximate the signal from wavelet coefficients

, , .

, ,

, ,

, ,

, , , , …

0

6

9

7

3

5

6

10

2

6

8

4

8

4

1

-1

-2

-2

6

6

2

2g

g

g

↓2

↓2

↓2

↓2

↓2

↓2

level 1coefficients

level 2coefficients

level 3coefficients

High pass filter,

Low pass filterg ,

Wavelet filter bank

∗ ↓

: average

: detail(backward difference)

⋆ ⋆

Wavelet coefficients:⋆⋆

Example of discrete Haar Wavelet Transform

for sound signal

Scattering convolution network

EEG 10-20 System

|x ⋆ |

|x ⋆ |

Example of continuous Wavelet Transform for

EEG signal

Scattering convolution network

A scattering transform computes non-linear invariantswith modulus | | and averaging pooling functions .

Scattering convolution network

⋆⋆ ⋆

⋆ ⋆ ⋆⋮ , ,…

lim→

Φ Φ

For appropriate wavelets, scattering is invariant to translation and stable to deformation.

is a diffeomorphism,

⋆ ⋆ ⋆⋆ ⋆⋆

Scattering convolution networkExample of

Scattering transform for EEG signal

⋆ ⋆

| ⋆ |

time averagetime average

⋆⋆ ⋆

⋆ ⋆ ⋆⋮ , ,…

Written By

Ian GoodfellowYoshua BengioAaron Courville

Subspace Methods: PCA, ICA

www.deeplearningbook.org

Basics in Principal Component Analysis

Suppose we would like to apply lossy compression to a collection of m points , ⋯ , ⊂ . Lossy compression means storing the points in a way that requires less memory but may lose some precision.

Slide Credit: Vaclav

Approximation by 4 principal components only

High-dimensional data ’s often lies on or near a much lower dimensional, curved manifold. A good way to represent data points is by low-dimensional coordinates . The low-dimensional representation of the data should capture information about high-dimensional pairwise distance.

Approximation by 4 principal components only

Slide Credit: Vaclav

Encoding/Decoding function

Let f: ∈ R → ∈ l n be an encoding function whichrepresents each data point x by a point c f x in the low-dimensional space R . PCA is defined by our choice of the decodingfunction g: ∈ R → ∈ such that g f . Let g c Dc whereD ∈ R defines the decoding. PCA constraints the columns of D tobe orthonormal vectors in R .

=

, , , ∈

Let where defines the decoding.

[ ]....

1ST column 2nd column 3rd column 4th column

Slide Credit: Vaclav

PCA constraints the columns of to be orthonormal vectors in .

To generate ∗ from , one may use∗ .

It is easy to see that∗

.

This optimization problem can be solve by .

How to choose encoding matrix

By defining the encoding function , we can define the PCA reconstructionoperation

An encoding matrix ∗ can be chosen by

∗ ∑ subject to .

How to extract the first principle component ∗

In the case when ∈ , can be simplified in a single vector and

∗ .

Denoting , ⋯ , ∈ , the first principle component ∗ can be obtained by

∗ .

A simple computation shows that

∗ .

This optimization problem may be solved using eigenvalue decomposition. Specifically, ∗ is givenby the eigenvector of corresponding to the largest eigenvalue.

32nd row

1st row

The first principle component

Slide Credit: Vaclav

More detailed explanation in computing the first principle component ∗

∗ .

∗ .

∑ ∑ =

∗ .

Subspace Methods

Slide Credit: Vaclav

top related