machine learning: basis and wavelet - wordpress.com€¦ · machine learning: basis and wavelet...
TRANSCRIPT
Machine Learning:Basis and Wavelet
Haar DWT in 2 levels
7 22 38 191
17 83 188 211
71 167 194 207
159 187 201 216
20 44
31 7
135
40 -17
46
13 -32
-17 1
18 42
27 4
32 157
146 204
김 화 평 (CSE )Medical Image computing lab 서진근교수 연구실
+ +
- -
+ -
+ -
+ -
- +
+ +
+ +
Machine learning is the field of study that gives computers the ability to learn the feed-forward function without being explicitly programmed.
Mission: Find a feed-forward function from labeled training data, , : , … , , such that , , … , .
Supervised learning is the machine learning technique of finding a feed-forward function iteratively from labeled training data, , : 1, … , , such that , 1, … , .
Machine learning: Why it is and why it matters.
Humans can typically create one or two good models a week; machine learning can create thousands of models a week
Basis: Fourier TransformBasis
The Fourier transform of is defined by .
Each fourier transform acts as a basis to demonstrate the ability to distinguish different signals.
Every function can be expressed as a linear combination of basis functions ∑ ,
where , , ⋯ is a set of orthonormal basis , 1 ,0 .
Approximation by 4 principal components (basis) only
Slide Credit: Vaclav
, ⋆ and , , ⋆Wavelet coefficients: ⋆⋆ , ,
: average: higher frequencies
, , ,
,
,
,
,
What is wavelet?
Why wavelets? • Wavelets are uniformly stable to deformations.
• Wavelets separate multiscale information.
• Wavelets provide sparse representations.
Scattering convolution networkFor appropriate wavelets, such a dreamlike kernel Φ can be represented by
scattering coefficients using wavelet transform.
Review on Wavelet
Haar DWT in 2 levels
7 22 38 191
17 83 188 211
71 167 194 207
159 187 201 216
20 44
31 7
135
40 -17
46
13 -32
-17 1
18 42
27 4
32 157
146 204
계산과학공학과 통합과정 김화평Medical Image computing lab 서진근교수 연구실
+ +
- -
+ -
+ -
+ -
- +
+ +
+ +
, ≔ / for .
Wavelet basis functions: The family of functions , : , ∈ , dyadic translations and dilations of a mother wavelet function , construct a complete orthonormal Hilbert basis.
, ,
where ,
, , .,
, 2 / 2 , 2 / 2 1
, 2 2 1 , 2 2 2 , 2 2 3 , 2 2 4
Discrete Haar wavelet Transform
Approximate the signal from wavelet coefficients
, , .
, ,
, ,
, ,
, , , , …
0
6
9
7
3
5
6
10
2
6
8
4
8
4
1
-1
-2
-2
6
6
2
2g
g
g
↓2
↓2
↓2
↓2
↓2
↓2
level 1coefficients
level 2coefficients
level 3coefficients
High pass filter,
Low pass filterg ,
∗
∗
∗
Wavelet filter bank
∗ ↓
: average
: detail(backward difference)
⋆ ⋆
⋆
⋆
Wavelet coefficients:⋆⋆
Example of discrete Haar Wavelet Transform
for sound signal
Scattering convolution network
EEG 10-20 System
|x ⋆ |
|x ⋆ |
Example of continuous Wavelet Transform for
EEG signal
Scattering convolution network
⋆
A scattering transform computes non-linear invariantswith modulus | | and averaging pooling functions .
Scattering convolution network
⋆⋆ ⋆
⋆ ⋆ ⋆⋮ , ,…
lim→
Φ Φ
For appropriate wavelets, scattering is invariant to translation and stable to deformation.
is a diffeomorphism,
⋆ ⋆ ⋆⋆ ⋆⋆
Scattering convolution networkExample of
Scattering transform for EEG signal
⋆ ⋆
| ⋆ |
time averagetime average
⋆⋆ ⋆
⋆ ⋆ ⋆⋮ , ,…
Written By
Ian GoodfellowYoshua BengioAaron Courville
Subspace Methods: PCA, ICA
www.deeplearningbook.org
Basics in Principal Component Analysis
Suppose we would like to apply lossy compression to a collection of m points , ⋯ , ⊂ . Lossy compression means storing the points in a way that requires less memory but may lose some precision.
Slide Credit: Vaclav
Approximation by 4 principal components only
High-dimensional data ’s often lies on or near a much lower dimensional, curved manifold. A good way to represent data points is by low-dimensional coordinates . The low-dimensional representation of the data should capture information about high-dimensional pairwise distance.
Approximation by 4 principal components only
Slide Credit: Vaclav
Encoding/Decoding function
Let f: ∈ R → ∈ l n be an encoding function whichrepresents each data point x by a point c f x in the low-dimensional space R . PCA is defined by our choice of the decodingfunction g: ∈ R → ∈ such that g f . Let g c Dc whereD ∈ R defines the decoding. PCA constraints the columns of D tobe orthonormal vectors in R .
=
, , , ∈
Let where defines the decoding.
[ ]....
1ST column 2nd column 3rd column 4th column
Slide Credit: Vaclav
PCA constraints the columns of to be orthonormal vectors in .
To generate ∗ from , one may use∗ .
It is easy to see that∗
.
This optimization problem can be solve by .
How to choose encoding matrix
By defining the encoding function , we can define the PCA reconstructionoperation
An encoding matrix ∗ can be chosen by
∗ ∑ subject to .
How to extract the first principle component ∗
In the case when ∈ , can be simplified in a single vector and
∗ .
Denoting , ⋯ , ∈ , the first principle component ∗ can be obtained by
∗ .
A simple computation shows that
∗ .
This optimization problem may be solved using eigenvalue decomposition. Specifically, ∗ is givenby the eigenvector of corresponding to the largest eigenvalue.
32nd row
1st row
∗
The first principle component
Slide Credit: Vaclav
More detailed explanation in computing the first principle component ∗
∗ .
∗ .
∑ ∑ =
∗ .
∗
Subspace Methods
Slide Credit: Vaclav