machine learning techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · machine learning techniques...

130
Machine Learning Techniques (_hx) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (0) [email protected] Department of Computer Science & Information Engineering National Taiwan University (¸cx˙ß) Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22

Upload: others

Post on 06-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Machine Learning Techniques(機器學習技法)

Lecture 3: Kernel Support Vector Machine

Hsuan-Tien Lin (林軒田)[email protected]

Department of Computer Science& Information Engineering

National Taiwan University(國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22

Page 2: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine

Roadmap

1 Embedding Numerous Features: Kernel Models

Lecture 2: Dual Support Vector Machinedual SVM: another QP with valuable geometric

messages and almost no dependence on d̃

Lecture 3: Kernel Support Vector MachineKernel TrickPolynomial KernelGaussian KernelComparison of Kernels

2 Combining Predictive Features: Aggregation Models3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/22

Page 3: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Dual SVM Revisitedgoal: SVM without dependence on d̃

half-way done:

minα

12α

T QDα− 1Tα

subject to yTα = 0;αn ≥ 0, for n = 1,2, . . . ,N

• qn,m = ynymzTn zm: inner product in Rd̃

• need: zTn zm = Φ(xn)

TΦ(xm) calculated faster than O(d̃)

can we do so?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22

Page 4: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Dual SVM Revisitedgoal: SVM without dependence on d̃

half-way done:

minα

12α

T QDα− 1Tα

subject to yTα = 0;αn ≥ 0, for n = 1,2, . . . ,N

• qn,m = ynymzTn zm: inner product in Rd̃

• need: zTn zm = Φ(xn)

TΦ(xm) calculated faster than O(d̃)

can we do so?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22

Page 5: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Dual SVM Revisitedgoal: SVM without dependence on d̃

half-way done:

minα

12α

T QDα− 1Tα

subject to yTα = 0;αn ≥ 0, for n = 1,2, . . . ,N

• qn,m = ynymzTn zm: inner product in Rd̃

• need: zTn zm = Φ(xn)

TΦ(xm) calculated faster than O(d̃)

can we do so?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22

Page 6: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i

+d∑

i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 7: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i

+d∑

i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 8: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i

+d∑

i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 9: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i +

d∑i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 10: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i +

d∑i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 11: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i +

d∑i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 12: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i +

d∑i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 +

xT x′

+ (

xT x′

)(

xT x′

)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 13: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i +

d∑i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 + xT x′ + (xT x′)(xT x′)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 14: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fast Inner Product for Φ22nd order polynomial transform

Φ2(x) = (1, x1, x2, . . . , xd , x21 , x1x2, . . . , x1xd , x2x1, x2

2 , . . . , x2xd , . . . , x2d )

—include both x1x2 & x2x1 for ‘simplicity’ :-)

Φ2(x)TΦ2(x′) = 1 +d∑

i=1

xix ′i +

d∑i=1

d∑j=1

xixjx ′i x

′j

= 1 +d∑

i=1

xix ′i +

d∑i=1

xix ′i

d∑j=1

xjx ′j

= 1 + xT x′ + (xT x′)(xT x′)

for Φ2, transform + inner product can becarefully done in O(d) instead of O(d2)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

Page 15: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs

= ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(

K (xn,xs)

)• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 16: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs

= ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(

K (xn,xs)

)• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 17: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs

= ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(

K (xn,xs)

)

• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 18: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs

= ys −N∑

n=1

αnyn

(

K (xn,xs)

)

• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 19: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs

= ys −N∑

n=1

αnyn

(

K (xn,xs)

)

• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 20: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(

K (xn,xs)

)

• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)

= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 21: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(K (xn,xs)

)

• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)

= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 22: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(K (xn,xs)

)• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)

= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 23: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(K (xn,xs)

)• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(

N∑n=1

αnynK (xn,x)

+ b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 24: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(K (xn,xs)

)• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(N∑

n=1

αnynK (xn,x) + b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 25: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel: Transform + Inner Producttransform Φ⇐⇒ kernel function: KΦ(x,x′) ≡ Φ(x)TΦ(x′)

Φ2 ⇐⇒ KΦ2(x,x′) = 1 + (xT x′) + (xT x′)2

• quadratic coefficient qn,m = ynymznT zm = ynymK (xn,xm)

• optimal bias b? from SV (xs, ys),

b = ys − w T zs = ys −(

N∑n=1

αnynzn

) T

zs = ys −N∑

n=1

αnyn

(K (xn,xs)

)• optimal hypothesis gSVM: for test input x,

gSVM(x) = sign(wTΦ(x) + b

)= sign

(N∑

n=1

αnynK (xn,x) + b

)

kernel trick: plug in efficient kernel functionto avoid dependence on d̃

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

Page 26: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints

2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 27: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 28: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 29: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 30: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 31: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 32: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 33: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Kernel SVM with QPKernel Hard-Margin SVM Algorithm

1 qn,m = ynymK (xn,xm);p = −1N ; (A,c) for equ./bound constraints2 α← QP(QD,p,A,c)

3 b ←(

ys −∑

SV indices nαnynK (xn,xs)

)with SV (xs, ys)

4 return SVs and their αn as well as b such that for new x,

gSVM(x) = sign( ∑

SV indices nαnynK (xn,x) + b

)

• 1 : time complexity O(N2) · (kernel evaluation)

• 2 : QP with N variables and N + 1 constraints

• 3 & 4 : time complexity O(#SV) · (kernel evaluation)

kernel SVM:use computational shortcut to avoid d̃ & predict with SV only

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

Page 34: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fun Time

Consider two examples x and x′ such that xT x′ = 10. What isKΦ2(x,x

′)?1 12 113 1114 1111

Reference Answer: 3

Using the derivation in previous slides,KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

Page 35: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Kernel Trick

Fun Time

Consider two examples x and x′ such that xT x′ = 10. What isKΦ2(x,x

′)?1 12 113 1114 1111

Reference Answer: 3

Using the derivation in previous slides,KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

Page 36: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 37: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) =

1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 38: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 39: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) =

1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 40: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 41: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 42: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 43: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Poly-2 KernelΦ2(x) = (1, x1, . . . , xd , x2

1 , . . . , x2d ) ⇔ KΦ2(x,x

′) = 1 + xT x′ + (xT x′)2

Φ2(x) = (1,√

2x1, . . . ,√

2xd , x21 , . . . , x

2d ) ⇔ K 2(x,x′) = 1 + 2xT x′ + (xT x′)2

Φ2(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

⇔ K 2(x,x′) = 1 + 2γxT x′ + γ2(xT x′)2

K 2(x,x′) = (1 + γxT x′)2 with γ > 0

• K 2: somewhat ‘easier’ to calculate than KΦ2

• Φ2 and Φ2: equivalent power,

Φ2 and Φ2:

different inner product =⇒ different geometry

K 2 commonly used

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

Page 44: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Poly-2 Kernels in Action

(1 + 0.001xT x′)2

1 + xT x′ + (xT x′)2

(1 + 1000xT x′)2

• gSVM different, SVs different—‘hard’ to say which is better before learning

• change of kernel⇔ change of margin definition

need selecting K , just like selecting Φ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Page 45: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Poly-2 Kernels in Action

(1 + 0.001xT x′)2 1 + xT x′ + (xT x′)2

(1 + 1000xT x′)2

• gSVM different, SVs different—‘hard’ to say which is better before learning

• change of kernel⇔ change of margin definition

need selecting K , just like selecting Φ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Page 46: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Poly-2 Kernels in Action

(1 + 0.001xT x′)2 1 + xT x′ + (xT x′)2 (1 + 1000xT x′)2

• gSVM different, SVs different—‘hard’ to say which is better before learning

• change of kernel⇔ change of margin definition

need selecting K , just like selecting Φ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Page 47: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Poly-2 Kernels in Action

(1 + 0.001xT x′)2 1 + xT x′ + (xT x′)2 (1 + 1000xT x′)2

• gSVM different, SVs different—‘hard’ to say which is better before learning

• change of kernel⇔ change of margin definition

need selecting K , just like selecting Φ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Page 48: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Poly-2 Kernels in Action

(1 + 0.001xT x′)2 1 + xT x′ + (xT x′)2 (1 + 1000xT x′)2

• gSVM different, SVs different—‘hard’ to say which is better before learning

• change of kernel⇔ change of margin definition

need selecting K , just like selecting Φ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Page 49: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Poly-2 Kernels in Action

(1 + 0.001xT x′)2 1 + xT x′ + (xT x′)2 (1 + 1000xT x′)2

• gSVM different, SVs different—‘hard’ to say which is better before learning

• change of kernel⇔ change of margin definition

need selecting K , just like selecting Φ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Page 50: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Polynomial Kernel

K 2(x,x′) = (ζ + γxT x′)2 with γ > 0, ζ ≥ 0

K 3(x,x′) = (ζ + γxT x′)3 with γ > 0, ζ ≥ 0...

K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• embeds ΦQ specially with parameters(γ, ζ)

• allows computing large-margin polynomialclassification without dependence on d̃

SVM + Polynomial Kernel: Polynomial SVM

10-th order polynomialwith margin 0.1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Page 51: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Polynomial Kernel

K 2(x,x′) = (ζ + γxT x′)2 with γ > 0, ζ ≥ 0K 3(x,x′) = (ζ + γxT x′)3 with γ > 0, ζ ≥ 0

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• embeds ΦQ specially with parameters(γ, ζ)

• allows computing large-margin polynomialclassification without dependence on d̃

SVM + Polynomial Kernel: Polynomial SVM

10-th order polynomialwith margin 0.1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Page 52: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Polynomial Kernel

K 2(x,x′) = (ζ + γxT x′)2 with γ > 0, ζ ≥ 0K 3(x,x′) = (ζ + γxT x′)3 with γ > 0, ζ ≥ 0

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• embeds ΦQ specially with parameters(γ, ζ)

• allows computing large-margin polynomialclassification without dependence on d̃

SVM + Polynomial Kernel: Polynomial SVM

10-th order polynomialwith margin 0.1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Page 53: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Polynomial Kernel

K 2(x,x′) = (ζ + γxT x′)2 with γ > 0, ζ ≥ 0K 3(x,x′) = (ζ + γxT x′)3 with γ > 0, ζ ≥ 0

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• embeds ΦQ specially with parameters(γ, ζ)

• allows computing large-margin polynomialclassification without dependence on d̃

SVM + Polynomial Kernel: Polynomial SVM

10-th order polynomialwith margin 0.1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Page 54: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Polynomial Kernel

K 2(x,x′) = (ζ + γxT x′)2 with γ > 0, ζ ≥ 0K 3(x,x′) = (ζ + γxT x′)3 with γ > 0, ζ ≥ 0

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• embeds ΦQ specially with parameters(γ, ζ)

• allows computing large-margin polynomialclassification without dependence on d̃

SVM + Polynomial Kernel: Polynomial SVM

10-th order polynomialwith margin 0.1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Page 55: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

General Polynomial Kernel

K 2(x,x′) = (ζ + γxT x′)2 with γ > 0, ζ ≥ 0K 3(x,x′) = (ζ + γxT x′)3 with γ > 0, ζ ≥ 0

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• embeds ΦQ specially with parameters(γ, ζ)

• allows computing large-margin polynomialclassification without dependence on d̃

SVM + Polynomial Kernel: Polynomial SVM

10-th order polynomialwith margin 0.1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Page 56: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Special Case: Linear Kernel

K 1(x,x′) = (0 + 1 · xT x′)1

...

K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• K 1: just usual inner product, calledlinear kernel

• ‘even easier’: can be solved (often inprimal form) efficiently

linear first, remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Page 57: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Special Case: Linear Kernel

K 1(x,x′) = (0 + 1 · xT x′)1

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• K 1: just usual inner product, calledlinear kernel

• ‘even easier’: can be solved (often inprimal form) efficiently

linear first, remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Page 58: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Special Case: Linear Kernel

K 1(x,x′) = (0 + 1 · xT x′)1

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• K 1: just usual inner product, calledlinear kernel

• ‘even easier’: can be solved (often inprimal form) efficiently

linear first, remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Page 59: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Special Case: Linear Kernel

K 1(x,x′) = (0 + 1 · xT x′)1

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• K 1: just usual inner product, calledlinear kernel

• ‘even easier’: can be solved (often inprimal form) efficiently

linear first, remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Page 60: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Special Case: Linear Kernel

K 1(x,x′) = (0 + 1 · xT x′)1

...K Q(x,x′) = (ζ + γxT x′)Q with γ > 0, ζ ≥ 0

• K 1: just usual inner product, calledlinear kernel

• ‘even easier’: can be solved (often inprimal form) efficiently

linear first, remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Page 61: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Fun Time

Consider the general 2-nd polynomial kernel K 2(x,x′) = (ζ + γxT x′)2.Which of the following transform can be used to derive this kernel?

1 Φ(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

2 Φ(x) = (ζ,√

2γx1, . . . ,√

2γxd , x21 , . . . , x

2d )

3 Φ(x) = (ζ,√

2γζx1, . . . ,√

2γζxd , x21 , . . . , x

2d )

4 Φ(x) = (ζ,√

2γζx1, . . . ,√

2γζxd , γx21 , . . . , γx2

d )

Reference Answer: 4

We need to have ζ2 from the 0-th order terms,2γζxT x′ from the 1-st order terms, andγ2(xT x′)2 from the 2-nd order terms.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

Page 62: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Polynomial Kernel

Fun Time

Consider the general 2-nd polynomial kernel K 2(x,x′) = (ζ + γxT x′)2.Which of the following transform can be used to derive this kernel?

1 Φ(x) = (1,√

2γx1, . . . ,√

2γxd , γx21 , . . . , γx2

d )

2 Φ(x) = (ζ,√

2γx1, . . . ,√

2γxd , x21 , . . . , x

2d )

3 Φ(x) = (ζ,√

2γζx1, . . . ,√

2γζxd , x21 , . . . , x

2d )

4 Φ(x) = (ζ,√

2γζx1, . . . ,√

2γζxd , γx21 , . . . , γx2

d )

Reference Answer: 4

We need to have ζ2 from the 0-th order terms,2γζxT x′ from the 1-st order terms, andγ2(xT x′)2 from the 2-nd order terms.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

Page 63: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)?

Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(

− (x)2

)exp(

− (x ′)2

)exp(

2xx ′

)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 64: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(

− (x)2

)exp(

− (x ′)2

)exp(

2xx ′

)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 65: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x),

K (x , x ′) = exp(−(x − x ′)2)

= exp(

− (x)2

)exp(

− (x ′)2

)exp(

2xx ′

)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)

= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 66: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(

− (x)2

)exp(

− (x ′)2

)exp(

2xx ′

)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)

= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 67: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(

− (x)2

)exp(

− (x ′)2

)exp(

2xx ′

)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)

= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 68: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)

= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 69: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)

= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 70: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)

= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 71: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

2i

i!

2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 72: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

√2i

i!

√2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 73: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

√2i

i!

√2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 74: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

√2i

i!

√2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 75: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Kernel of Infinite Dimensional Transforminfinite dimensional Φ(x)? Yes, if K (x,x′) efficiently computable!

when x = (x), K (x , x ′) = exp(−(x − x ′)2)

= exp(− (x)2)exp(− (x ′)2)exp(2xx ′)

Taylor= exp(−(x)2)exp(−(x ′)2)

( ∞∑i=0

(2xx ′)i

i!

)

=∞∑

i=0

(exp(−(x)2)exp(−(x ′)2)

√2i

i!

√2i

i!(x)i(x ′)i

)= Φ(x)TΦ(x ′)

with infinite dimensional Φ(x) = exp(−x2) ·(

1,√

21!x ,

√22

2! x2, . . .

)

more generally, Gaussian kernelK (x,x′) = exp

(−γ‖x− x′‖2

)with γ > 0

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

Page 76: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Hypothesis of Gaussian SVMGaussian kernel K (x,x′) = exp

(−γ‖x− x′‖2

)

gSVM(x) = sign

(∑SV

αnynK (xn,x) + b

)

= sign

(∑SV

αnynexp(−γ‖x− xn‖2

)+ b

)

• linear combination of Gaussians centered at SVs xn

• also called Radial Basis Function (RBF) kernel

Gaussian SVM:find αn to combine Gaussians centered at xn& achieve large margin in infinite-dim. space

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

Page 77: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Hypothesis of Gaussian SVMGaussian kernel K (x,x′) = exp

(−γ‖x− x′‖2

)

gSVM(x) = sign

(∑SV

αnynK (xn,x) + b

)

= sign

(∑SV

αnynexp(−γ‖x− xn‖2

)+ b

)

• linear combination of Gaussians centered at SVs xn

• also called Radial Basis Function (RBF) kernel

Gaussian SVM:find αn to combine Gaussians centered at xn& achieve large margin in infinite-dim. space

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

Page 78: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Hypothesis of Gaussian SVMGaussian kernel K (x,x′) = exp

(−γ‖x− x′‖2

)

gSVM(x) = sign

(∑SV

αnynK (xn,x) + b

)

= sign

(∑SV

αnynexp(−γ‖x− xn‖2

)+ b

)

• linear combination of Gaussians centered at SVs xn

• also called Radial Basis Function (RBF) kernel

Gaussian SVM:find αn to combine Gaussians centered at xn& achieve large margin in infinite-dim. space

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

Page 79: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Hypothesis of Gaussian SVMGaussian kernel K (x,x′) = exp

(−γ‖x− x′‖2

)

gSVM(x) = sign

(∑SV

αnynK (xn,x) + b

)

= sign

(∑SV

αnynexp(−γ‖x− xn‖2

)+ b

)

• linear combination of Gaussians centered at SVs xn

• also called Radial Basis Function (RBF) kernel

Gaussian SVM:find αn to combine Gaussians centered at xn& achieve large margin in infinite-dim. space

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

Page 80: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Hypothesis of Gaussian SVMGaussian kernel K (x,x′) = exp

(−γ‖x− x′‖2

)

gSVM(x) = sign

(∑SV

αnynK (xn,x) + b

)

= sign

(∑SV

αnynexp(−γ‖x− xn‖2

)+ b

)

• linear combination of Gaussians centered at SVs xn

• also called Radial Basis Function (RBF) kernel

Gaussian SVM:find αn to combine Gaussians centered at xn& achieve large margin in infinite-dim. space

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

Page 81: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Hypothesis of Gaussian SVMGaussian kernel K (x,x′) = exp

(−γ‖x− x′‖2

)

gSVM(x) = sign

(∑SV

αnynK (xn,x) + b

)

= sign

(∑SV

αnynexp(−γ‖x− xn‖2

)+ b

)

• linear combination of Gaussians centered at SVs xn

• also called Radial Basis Function (RBF) kernel

Gaussian SVM:find αn to combine Gaussians centered at xn& achieve large margin in infinite-dim. space

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

Page 82: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Support Vector Mechanismlarge-marginhyperplanes

+ higher-order transforms with kernel trick# not many

boundary sophisticated

• transformed vector z = Φ(x) =⇒ efficient kernel K (x,x′)

• store optimal w =⇒ store a few SVs and αn

new possibility by Gaussian SVM:infinite-dimensional linear classification, withgeneralization ‘guarded by’ large-margin :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Page 83: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Support Vector Mechanismlarge-marginhyperplanes

+ higher-order transforms with kernel trick# not many

boundary sophisticated

• transformed vector z = Φ(x) =⇒ efficient kernel K (x,x′)

• store optimal w =⇒ store a few SVs and αn

new possibility by Gaussian SVM:infinite-dimensional linear classification, withgeneralization ‘guarded by’ large-margin :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Page 84: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Support Vector Mechanismlarge-marginhyperplanes

+ higher-order transforms with kernel trick# not many

boundary sophisticated

• transformed vector z = Φ(x) =⇒ efficient kernel K (x,x′)

• store optimal w =⇒ store a few SVs and αn

new possibility by Gaussian SVM:infinite-dimensional linear classification, withgeneralization ‘guarded by’ large-margin :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Page 85: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Support Vector Mechanismlarge-marginhyperplanes

+ higher-order transforms with kernel trick# not many

boundary sophisticated

• transformed vector z = Φ(x) =⇒ efficient kernel K (x,x′)

• store optimal w =⇒ store a few SVs and αn

new possibility by Gaussian SVM:infinite-dimensional linear classification, withgeneralization ‘guarded by’ large-margin :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Page 86: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Gaussian SVM in Action

exp(−1‖x− x′‖2)

exp(−10‖x− x′‖2) exp(−100‖x− x′‖2)

• large γ =⇒ sharp Gaussians =⇒ ‘overfit’?• warning: SVM can still overfit :-(

Gaussian SVM: need careful selection of γ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Page 87: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Gaussian SVM in Action

exp(−1‖x− x′‖2) exp(−10‖x− x′‖2)

exp(−100‖x− x′‖2)

• large γ =⇒ sharp Gaussians =⇒ ‘overfit’?• warning: SVM can still overfit :-(

Gaussian SVM: need careful selection of γ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Page 88: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Gaussian SVM in Action

exp(−1‖x− x′‖2) exp(−10‖x− x′‖2) exp(−100‖x− x′‖2)

• large γ =⇒ sharp Gaussians =⇒ ‘overfit’?• warning: SVM can still overfit :-(

Gaussian SVM: need careful selection of γ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Page 89: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Gaussian SVM in Action

exp(−1‖x− x′‖2) exp(−10‖x− x′‖2) exp(−100‖x− x′‖2)

• large γ =⇒ sharp Gaussians =⇒ ‘overfit’?

• warning: SVM can still overfit :-(

Gaussian SVM: need careful selection of γ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Page 90: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Gaussian SVM in Action

exp(−1‖x− x′‖2) exp(−10‖x− x′‖2) exp(−100‖x− x′‖2)

• large γ =⇒ sharp Gaussians =⇒ ‘overfit’?• warning: SVM can still overfit :-(

Gaussian SVM: need careful selection of γ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Page 91: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Gaussian SVM in Action

exp(−1‖x− x′‖2) exp(−10‖x− x′‖2) exp(−100‖x− x′‖2)

• large γ =⇒ sharp Gaussians =⇒ ‘overfit’?• warning: SVM can still overfit :-(

Gaussian SVM: need careful selection of γ

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Page 92: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Fun Time

Consider the Gaussian kernel K (x,x′) = exp(−γ‖x− x′‖2). Whatfunction does the kernel converge to if γ →∞?

1 Klim(x,x′) = 02 Klim(x,x′) = Jx = x′K3 Klim(x,x′) = Jx 6= x′K4 Klim(x,x′) = 1

Reference Answer: 2

If x = x′, K (x,x′) = 1 regardless of γ. If x 6= x′,K (x,x′) = 0 when γ →∞. Thus, Klim is animpulse function, which is an extreme case ofhow the Gaussian gets sharper when γ →∞.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

Page 93: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Gaussian Kernel

Fun Time

Consider the Gaussian kernel K (x,x′) = exp(−γ‖x− x′‖2). Whatfunction does the kernel converge to if γ →∞?

1 Klim(x,x′) = 02 Klim(x,x′) = Jx = x′K3 Klim(x,x′) = Jx 6= x′K4 Klim(x,x′) = 1

Reference Answer: 2

If x = x′, K (x,x′) = 1 regardless of γ. If x 6= x′,K (x,x′) = 0 when γ →∞. Thus, Klim is animpulse function, which is an extreme case ofhow the Gaussian gets sharper when γ →∞.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

Page 94: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Linear Kernel: Cons and Pros

K (x,x′) = xT x′

Cons• restricted

—not always separable?!

−1 0 1−1

0

1

Pros

• safe—linear first,remember? :-)

• fast—with special QPsolver in primal

• very explainable—w andSVs say something

linear kernel: an important basic tool

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

Page 95: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Linear Kernel: Cons and Pros

K (x,x′) = xT x′

Cons• restricted

—not always separable?!

−1 0 1−1

0

1

Pros• safe—linear first,

remember? :-)

• fast—with special QPsolver in primal

• very explainable—w andSVs say something

linear kernel: an important basic tool

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

Page 96: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Linear Kernel: Cons and Pros

K (x,x′) = xT x′

Cons• restricted

—not always separable?!

−1 0 1−1

0

1

Pros• safe—linear first,

remember? :-)• fast—with special QP

solver in primal

• very explainable—w andSVs say something

linear kernel: an important basic tool

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

Page 97: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Linear Kernel: Cons and Pros

K (x,x′) = xT x′

Cons• restricted

—not always separable?!

−1 0 1−1

0

1

Pros• safe—linear first,

remember? :-)• fast—with special QP

solver in primal• very explainable—w and

SVs say something

linear kernel: an important basic tool

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

Page 98: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Linear Kernel: Cons and Pros

K (x,x′) = xT x′

Cons• restricted

—not always separable?!

−1 0 1−1

0

1

Pros• safe—linear first,

remember? :-)• fast—with special QP

solver in primal• very explainable—w and

SVs say something

linear kernel: an important basic tool

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

Page 99: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Linear Kernel: Cons and Pros

K (x,x′) = xT x′

Cons• restricted

—not always separable?!

−1 0 1−1

0

1

Pros• safe—linear first,

remember? :-)• fast—with special QP

solver in primal• very explainable—w and

SVs say something

linear kernel: an important basic tool

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

Page 100: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons

• numerical difficulty forlarge Q

• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros

• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 101: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons

• numerical difficulty forlarge Q

• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear

• strong physical control—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 102: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons

• numerical difficulty forlarge Q

• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 103: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons

• numerical difficulty forlarge Q

• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 104: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons• numerical difficulty for

large Q• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 105: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons• numerical difficulty for

large Q• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 106: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons• numerical difficulty for

large Q• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only

—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 107: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Polynomial Kernel: Cons and Pros

K (x,x′) = (ζ + γxT x′)Q

Cons• numerical difficulty for

large Q• |ζ + γxT x′| < 1: K → 0• |ζ + γxT x′| > 1: K → big

• three parameters (γ, ζ,Q)—more difficult to select

Pros• less restricted than linear• strong physical control

—‘knows’ degree Q

polynomial kernel: perhaps small-Q only—sometimes efficiently done by linear on ΦQ(x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Page 108: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons

• mysterious—no w• slower than linear• too powerful?!

Pros

• more powerful thanlinear/poly.

• bounded—less numericaldifficulty than poly.

• one parameteronly—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 109: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons

• mysterious—no w• slower than linear• too powerful?!

Pros• more powerful than

linear/poly.

• bounded—less numericaldifficulty than poly.

• one parameteronly—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 110: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons

• mysterious—no w• slower than linear• too powerful?!

Pros• more powerful than

linear/poly.• bounded—less numerical

difficulty than poly.

• one parameteronly—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 111: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons

• mysterious—no w• slower than linear• too powerful?!

Pros• more powerful than

linear/poly.• bounded—less numerical

difficulty than poly.• one parameter

only—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 112: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons• mysterious—no w

• slower than linear• too powerful?!

Pros• more powerful than

linear/poly.• bounded—less numerical

difficulty than poly.• one parameter

only—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 113: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons• mysterious—no w• slower than linear

• too powerful?!

Pros• more powerful than

linear/poly.• bounded—less numerical

difficulty than poly.• one parameter

only—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 114: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons• mysterious—no w• slower than linear• too powerful?!

Pros• more powerful than

linear/poly.• bounded—less numerical

difficulty than poly.• one parameter

only—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with care

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 115: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Gaussian Kernel: Cons and Pros

K (x,x′) = exp(−γ‖x−x′‖2)

Cons• mysterious—no w• slower than linear• too powerful?!

Pros• more powerful than

linear/poly.• bounded—less numerical

difficulty than poly.• one parameter

only—easier to selectthan poly.

Gaussian kernel: one of most popular but shall be used with careHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Page 116: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel?

not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[

z1 z2 . . . zN

] T [

z1 z2 . . . zN

]= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 117: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel?

not really

• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[

z1 z2 . . . zN

] T [

z1 z2 . . . zN

]= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 118: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really

• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=[

z1 z2 . . . zN

] T [

z1 z2 . . . zN

]= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 119: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition

• symmetric

• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=[

z1 z2 . . . zN

] T [

z1 z2 . . . zN

]= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 120: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition

• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=[

z1 z2 . . . zN

] T [

z1 z2 . . . zN

]= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 121: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition

• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[

z1 z2 . . . zN

] T [

z1 z2 . . . zN

]

= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 122: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition

• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[z1 z2 . . . zN

] T [ z1 z2 . . . zN]

= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 123: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition

• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[z1 z2 . . . zN

] T [ z1 z2 . . . zN]

= ZZT

must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 124: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary

& sufficient

conditions for valid kernel:

Mercer’s condition

• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[z1 z2 . . . zN

] T [ z1 z2 . . . zN]

= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 125: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary & sufficient conditions for valid kernel:

Mercer’s condition

• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[z1 z2 . . . zN

] T [ z1 z2 . . . zN]

= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 126: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary & sufficient conditions for valid kernel:

Mercer’s condition• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[z1 z2 . . . zN

] T [ z1 z2 . . . zN]

= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 127: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Other Valid Kernels

• kernel represents special similarity: Φ(x)TΦ(x′)

• any similarity =⇒ valid kernel? not really• necessary & sufficient conditions for valid kernel:

Mercer’s condition• symmetric• let kij = K (xi ,xj), the matrix K

=

Φ(x1)

TΦ(x1) Φ(x1)TΦ(x2) . . . Φ(x1)

TΦ(xN)Φ(x2)

TΦ(x1) Φ(x2)TΦ(x2) . . . Φ(x2)

TΦ(xN). . . . . . . . . . . .

Φ(xN)TΦ(x1) Φ(xN)

TΦ(x2) . . . Φ(xN)TΦ(xN)

=

[z1 z2 . . . zN

] T [ z1 z2 . . . zN]

= ZZT must always be positive semi-definite

define your own kernel: possible, but hard

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Page 128: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Fun TimeWhich of the following is not a valid kernel? (Hint: Consider two1-dimensional vectors x1 = (1) and x2 = (−1) and check Mercer’scondition.)

1 K (x,x′) = (−1 + xT x′)2

2 K (x,x′) = (0 + xT x′)2

3 K (x,x′) = (1 + xT x′)2

4 K (x,x′) = (−1− xT x′)2

Reference Answer: 1

The kernels in 2 and 3 are just polynomial

kernels. The kernel in 4 is equivalent to thekernel in 3 . For 1 , the matrix K formedfrom the kernel and the two examples is notpositive semi-definite. Thus, the underlyingkernel is not a valid one.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22

Page 129: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Fun TimeWhich of the following is not a valid kernel? (Hint: Consider two1-dimensional vectors x1 = (1) and x2 = (−1) and check Mercer’scondition.)

1 K (x,x′) = (−1 + xT x′)2

2 K (x,x′) = (0 + xT x′)2

3 K (x,x′) = (1 + xT x′)2

4 K (x,x′) = (−1− xT x′)2

Reference Answer: 1

The kernels in 2 and 3 are just polynomial

kernels. The kernel in 4 is equivalent to thekernel in 3 . For 1 , the matrix K formedfrom the kernel and the two examples is notpositive semi-definite. Thus, the underlyingkernel is not a valid one.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22

Page 130: Machine Learning Techniques hxÒ Õhtlin/mooc/doc/203_present.pdf · Machine Learning Techniques (_hxÒ•Õ) Lecture 3: Kernel Support Vector Machine Hsuan-Tien Lin (ŠÒ0) htlin@csie.ntu.edu.tw

Kernel Support Vector Machine Comparison of Kernels

Summary

1 Embedding Numerous Features: Kernel Models

Lecture 3: Kernel Support Vector MachineKernel Trick

kernel as shortcut of transform + inner productPolynomial Kernel

embeds specially-scaled polynomial transformGaussian Kernel

embeds infinite dimensional transformComparison of Kernels

linear for efficiency or Gaussian for power

• next: avoiding overfitting in Gaussian (and other kernels)

2 Combining Predictive Features: Aggregation Models3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/22