ml 94 1 chap2 supervised learning

7/24/2019 ML 94 1 Chap2 Supervised Learning

1/32

(01-805-11-13)

1394

http://faculties.sbu.ac.ir/~a_mahmoudi/

Machine Learning


2/32

2


3/32

.

.

.

.

:

x1: price, x2: engine power 3

Class learning is f indin g a descr ipt ion that is shared by all pos i t ive

examples and none of the negative examples.

Prediction

Knowledge extraction


4/32

4

N

t

tt,r 1}{ xX

negativeisif

positiveisif

x

x

0

1r

2

1

x

xx

Training set


5/32

Hypothesis class H

5

2121 powerengineANDprice eepp

Hypothesis Class

hH Hypothesis

.

.

.

.


6/32

negativeissaysif

positiveissaysif)(

x

x

x

h

hh

0

1

6

N

t

tt rhhE1

1 x)|( XError of h on H

empirical error (training error)

Ch

.

h

.


7/32

.

.

.

7

Generalization


8/32

Version space

8

most specific hypothesis, S

most general hypothesis, G

h H, between S and G is consistent

and make up the version space(Mitchell, 1997)


9/32

.

.

9


10/32

.

.

C

.

10

0( |X)E h

doubt

capacity


11/32

VC Dimension

N

2N

)

).

2N

HN

(

)

shatter .

N

.

NVapnik-Chervonenkis (VC)Dimension VC(H ) = N

.

11

hH


12/32

VC Dimension

12

3 points shattered 4 points impossible


13/32

.

C

PAC-learnable

1

.

,>0

(N)

1

13

Probably Approximately Correct (PAC) Learning


14/32

14

Each strip is at most /4

Pr that we miss a strip 1

/4Pr that N instances miss a strip (1 /4)N

Pr that N instances miss 4 strips is at most 4(1 /4)N

4(1 /4)N and (1 x)exp( x)

4exp( N/4) and N (4/)log(4/)

P{Ch} 1-

Probably Approximately Correct (PAC) Learning


15/32

.

.

.

15

Teacher noise


16/32

.

16


17/32

.

.

.

bias

.

.

.

.

17

Less var iance


18/32

Occam

Occam William of Ockham

.

14

Occam

.

.

18

Occams razor


19/32

(k)

19

N

t

tt,r 1}{ xX

,i f

i f

ijr

j

t

i

t

t

iC

C

x

x

0

1

,i f

i f

ijh

j

t

i

t

t

iC

C

x

x

x

0

1

Train hypotheses

hi(x), i =1,...,K:

r

k

K

Reject(doubt)


20/32

.

.

20


21/32

:

.

(

)

.

(g(x)).

21

tN

t

tt rrx ,1

X

tt xfr

ttt zxfr ,*

()


22/32

(

)...

22

:

N

t

ttxgr

NgE

1

21X|

Pattern Recog nit ion andMachine Learning (Bisho p)


23/32

:g(x)

:

23

N

t

tt

wxwrNwwE 1

2

0101

1

X|,

01

02211 wxwwxwxwgd

j

jj

x

()


24/32

()...

:

24

N

t

tt wxwrNw

E

1

01

0

12

N

t

ttt xwxwrNw

E

1

01

1

2

xwrw 10

221 xNxNrxrx

w

t

t

t

tt

N

xx tt

N

rr t

t

N

t

ttwxwr

NwwE

1

2

0101

1X|,


25/32

:

25


26/32

Pattern Recog nit ion andMachine Learning (Bisho p)26

Over fitting

Best fit


27/32

ill-posed

.

.

.

27

M d l S l ti


28/32

.

inductive bias .

.

.

.

28

Model Selection


29/32

.

underfitting .

.

validation

(validation error) .

overfitting

.

.

29

T d ff


30/32

Tradeoff

tradeoff

:

Hc (H)

30

N

E

c (

)

f i rst E

and then E

Cross Val idat ion


31/32

(validation set)

.

validation

.

validation

.

31

Cross-Val idat ion

Test set (pub l icat ion set)


32/32

(cost or loss function)

.

.

|xg

t

ttgrLE |,| xX

X|minarg* E

ml 94 1 chap2 supervised learning

Documents