tutorial workshop on - 國立臺灣大學 · pythagoras theorem 3. 4 information divergence class...

235
1 Learning with Information Divergence Geometry Shinto Eguchi and Osamu Komori The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected] Url: http//www.ism.ac.jp/~eguchi/ Tutorial Workshop on 1

Upload: lekhanh

Post on 02-Apr-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

1

Learning with Information Divergence Geometry

Shinto Eguchiand

Osamu Komori

The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected]: http//www.ism.ac.jp/~eguchi/

Tutorial Workshop on

1

Page 2: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

2

Outline

9:30~10:30 Information divergence class and robust statistical methods I

11:00~12:00 Information divergence class and robust statistical methods II

13:30~14:30 Information geometry on model uncertainty

15:00~16:00 Boosting leaning algorithm and U-loss functions I

9:30~10:30 Boosting leaning algorithm and U-loss functions II

11:00~12:00 Pattern recognition from genome and omics data

2

Page 3: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

3

Key words

Linear connections

Statistical model and estimation

Divergence geometry

Duality of inference and model

GeodesicABC of Information geometry

Transversality

Riemannian metric (information metric)

Metric and dual connections

Pythagoras theorem

Max entropy model

minimaxity

U-boost

Observational bias Selection bias

Sensitivity analysis The worst case

ε –perturbed model

Pythagoras theorem

3

Page 4: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4

Information divergence class and robust statistical methods I

4

Page 5: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

5

geometry

learning

statistics

quantum

1982 paperS. Amari

1975 paperB. Efron

P. Dawidcomment

Historical comment

1984WorkshopRSS150

C.R.Rao1945

R.A.Fisher1922

5

Page 6: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

6

What is IG?

Geometry Uncertainty

Information space

It is a method to quantify uncertainty,

Or, a viewpoint to understand uncertainty?

6

Page 7: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

7

Dual Riemannian Geometry

Dual Riemannian geometry gives reformulation for Fisher’s foundation

Cf. Einstein field equations (1916)

Information Geometery aims to geometrizeA dualistic structure between modeling and estimation

Estimation is projection of data onto model.

“Estimation is an action by projection and model is an object to be projected”

The interplay of action and object is elucidated by a geometric structure.

Cf. Erlangen program (Klein, 1872)http://en.wikipedia.org/wiki/Erlangen_program

7

Page 8: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

8

2 x 2 table

The space of all 2×2 tables associates with a regular tetrahedron

We know the ruled surface is exponential geodesic, butdoes anyone know the minimality of the ruled surface?

The space of all independent 2×2 tables associates with a ruled surfaceIn the regular tetrahedron

8

Page 9: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

9

regular tetrahedron

P

Q

S

R

0010

1000

0100

0001

1/4

1/4

1/4

1/4

1−p0p0

0p0 1−p

9

Page 10: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

10

regular tetrahedron

P

Q

S

R

2/301/30

02/30 1/3

002/3 1/3

2/31/300

+

=

32

32

31

32

32

31

31

31

××

××

×31

×32

02/30 1/3

2/301/30

+×31

×32=

002/3 1/3

2/31/300

10

Page 11: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

11

P

Q

S

R

P

Q

S

R

Ruled surface

11

Page 12: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

12

1 00 0

p q p (1-q)(1-p)q (1-p) (1-q)

0 00 1

0 10 0

hgfeyzxwn

22 )( −

0)1()1(

})1)(1()1)(1({ 2

=−−

−−−−−=

qqppqpqpqppqn

0 01 0

nhg

fwzD

eyxC

BA

wzyxnwyhzxgwzfyxe

+++=+=+=+=+=

,,,

11−qq1−p(1−p)(1−q)(1−p)qD

Pp(1−q)pqC

BA

12

Page 13: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

13

e-geodesical

-10

0

10 -5

0

5

-5

0

5

-10

0

10

-5

0

5

-10

0

10-10

-5

0

5

10

-5

0

5

-10

0

10

}){( 10,10,1

log,1

log,)1)(1(

log <<<<−−−−

qpq

qp

pqp

qp }{ RI,RI),,,( ∈∈+ yxyxyx

2112112222

21

22

12

22

11211211 1wherelog,log,log),,( )( ππππ

ππ

ππ

πππππ −−−=→−

3S 3RI

13

Page 14: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

14

Two parametrization

-10

0

10 -5

0

5

-5

0

5

-10

0

10

-5

0

5

)( )1(),1(, qpqpqp −−

)( ,qp

)(1

log,1

log,)1)(1(

logq

qp

pqp

qp−−−−

14

Page 15: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

15

0

5

10

0

5

10

0

0.05

0.1

0.15

0

5

10

Two Gaussian distributions ),(and),( 21 ININ μμ

),( 1 IN μ ),( 2 IN μ

1μ2μ

15

Page 16: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

16

0

5

10

0

5

10

0

0.05

0.1

0.15

0

5

10

),( 1 IN μ

),( 2 IN μ

),( 3 IN μ

Pythagoras theorem in a functional space

222 543 =+

16

Page 17: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

17

Kullback-Leibler Divergence

∫= xxxx d)()(log)(),(KL q

ppqpD

p

q

r

),(),(),( KLKLKL rqDqpDrpD +=

Let p(x) and q(x) be probabiulity density fubnctions.

Then Kullback-Leibler Divergence defined by

17

Page 18: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

18

Two one-parameter families

)PI,(),()()1()()( ∈+−= qptqptp mt xxx

Let the space of all pdfs on a data space.PI

)PI,(,)}({)}({)( 1)( ∈= − rqqrcr sss

es xxx

xxx dqrc sss ∫ −= )}({)}({/1 1

ここで

p

q

r

)(mtp

)(esr

m-geodesic

e-geodesic

18

Page 19: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

19

Pythagoras theorem

),(),(),( rqDqpDrpD +=

])1,0[]1,0[),((),(),(),( )()()()( ×∈∀+= tsrqDqpDrpD es

mt

es

mt

p

q

r

)(mtp

)(esr

19

Page 20: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

20

Proof}),(),({),( )(

KL)(

KL)()(

KLe

smt

es

mt rqDqpDrpD +−

∫ −−= xxxxx drqqp es

mt ))(log)())(log()(( )()(

∫ −−−−−= xxxxx dcrqsqpt s}log)(log)()(log1)){(()()(1(

∫ −−−−= xxxxx drqqpst ))(log)())(log()(()1)(1(

∫ −−− xxx dqpct s ))()((log)1(

)},(),(),(){1)(1( KLKLKL rqDqpDrpDst −−−−=

0=

20

Page 21: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

21

ABC in differential geometry

Linear connection defines parallelism along a vector field.

Geodesic = a minimal arc between x and y

Riemannian metric defines an inner product on any tangent space of a manifold.

RI)()(:),( →× MMggM XX

∫=

====

1

0))(),((argmin

10})1(,)0(:)({

dttxtxgtyxxxtxc

γ

)()()(: MMM XXX →×∇

))(,),(()()()2(

,)1(

MYXMfYfXYfYf

YfY

XX

XXf

XF ∈∀∈∀+∇=∇

∇=∇

∑=

Γ=∇d

kk

kjijX XX

i1

Componetwise

21

Page 22: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

22

GeodesicA one-parameter family (curve) is called geodesic }:)({ εε ≤≤−= ttC θ

),...,1(0)()())(()(1 1

pitttt kkp

k

p

j

ikj

i =∀=Γ+∑∑= =

θθθ θ

with respect to a linear connection },,1:)({ pkjiikj ≤≤Γ θ

Remark 2:If , any geodesic is a line.),,1(0)( pkjii

kj ≤≤=Γ θ

The property is not invariant with parametrization.

The gedesic C is expressed by a transform rule from parameter θ to ω

),...,1(0)()())((~)(1 1

patttt cbp

c

p

b

abc

a =∀=Γ+ ∑∑= =

ωωω ω

,)()(~11 1 1

∑∑∑∑== = = ∂

∂+Γ=Γ

p

ac

iba

i

p

k

p

j

p

i

ikj

ai

jb

kc

abc

BBBBBω

θω )(for, θaaa

iiai

aai tBB =

∂∂

=∂∂

= ωωθ

θω

22

Page 23: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

23

Change rule on connectionsLet us consider a transform φ from parameter θ to ω . Then a geodesic C is

written by . Hence we have the following:

),...,1(0)()())((~)(1 1

patttt cbp

c

p

b

abc

a =∀=Γ+ ∑∑= =

ωωω ω

,)()(~11 1 1

∑∑∑∑== = = ∂

∂+Γ=Γ

p

ac

iba

i

p

k

p

j

p

i

ikj

ai

jb

kc

abc

BBBBBω

θω )(for)( 1θaa

a

iiaB ϕω

ωϕ

=∂

∂=

}:))(()({ εεϕ <<−= ttt θω

),...,1()())(()(1

pattBtp

i

iai

a =∀= ∑=

θω θ

),...,1()()())(()())(()(1 11

patttBttBtp

j

p

i

jij

ai

p

i

iai

a =∀∂

∂+= ∑∑∑

= ==

θθθ

θω θθ

⎟⎟⎠

⎞⎜⎜⎝

∂∂

= i

aaiB

θϕ

23

Page 24: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

24

What reference books?Foundations of Differential Geometry (Wiley Classics Library) Shoshichi Kobayashi, Katsumi Nomizu

Methods of Information Geometry Shun-Ichi Amari , Hiroshi NagaokaAmer Mathematical Society (2001)

Differential Geometrical Methods in StatisticsShun-Ichi Amari 1985 年 Springer

24

Page 25: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

25

Statistical model and information

Statistical model }:),()({ Θ∈== θθxxθ ppM

Score vector ),(log),( θxθxsθ

p∂∂=

Space of score vector }RI:),({ T psT ∈⋅= αθαθ

Fisher information metric ),(}{),( θθθ TvuuvEvug ∈∀=

Fisher information matrix }),(),({ Tθxθxθθ ssEI =

)RI( p⊂Θ

25

Page 26: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

26

Score vector spaceNote 1:

0),(),(

),(),(),(),()(T

T

==

==⇒∈

∫∫∫xθxθxsα

xθxθxsαxθxθxθθ

dp

dpdpuuETu

βαβxθxθxsθxsα

xθxθxθx

θ

θθ

Idp

dpvuvugTvuTTT }),(),(),({

),(),(),(),(,

==

=⇒∈

∫∫

})),((,0)),((:),({ ∞<== θxθxθx θθθ tVtEtS

The space of all random variables with mean 0 and finite variance

Note 2: is an infinite dimensional vector space that includesθS θT

),(),(For θxsθx Tu α=

,....2,1},{ ),(...

),(...

},),({),(11

=∂∂

∂−

∂∂∂

− kuEuuEuiki

k

iki

kkk θxθxθxθx θθ θθθθ

belong to θS

26

Page 27: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

27

Linear connections

),,1()( }{ '1'

'e

pkjissEg ijk

p

i

iiikj ≤≤=Γ

∂∑= θ

θθ

),,1()( }]{}{[ ''1'

'm

pkjisssEssEg ijkijk

p

i

iiikj ≤≤+=Γ

∂∑=

θθθθ

e-connection

m-connection

)PI,(),()()1()()( ∈+−= qptqptp mt xxx

)PI,(,)}({)}({)( 1)( ∈= − rqqrcr sss

es xxx

m-geodesic

e-geodesic

Score vectorpiis 1)),((),( == θxθxs

27

Page 28: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

28

Geodesical modelsDefinition

))1,0(()()()(),( 1 ∈∀∈⇒∈ − tMqpcMqp ttt xxxx

(i) Statistical model M is said to be e-geodesical if

(ii) Statistical model M is said to be m-geodesic if

))1,0(()()()1()(),( ∈∀∈+−⇒∈ tMqtptMqp xxxx

Note : Let P be a space of all probability density functions.

∫ −= xxx dqpc ttt )()(/1 1where

By definition P is e-geodesical and m-geodesical.

However the theoretical framework for P is not perfectly complete.

Cf. Pistone and Sempi (1995, AS)

28

Page 29: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

29

Two type of modeling

})};()(exp{)(),({ 0)e( Θ∈−== θθxtθxθx κTppM

})}({)}({:)({0

)m( xtxtx pp EEpM ==

Exponential model

Matched mean model

})(:RI{ ∞<∈=Θ θθ κp

Let p0(x) be a pdf and t (x) a p-dimensional statistic.

})};()()(logexp{)(),({

1 00

)e( Θ∈−== ∑=

θθxxxθx κθ

I

i

ii p

pppM

}:)()()1(),({1

01

)m( Θ∈+−== ∑∑==

θθθI

iii

I

ii pppM xxθxMixture model

})(:RI{ ∞<∈=Θ θθ κp

Let be a set of pdfs.},...,1,0:)({ Iipi =x

)},...,1(0,10:),...,({1

1 Iii

I

iiI =∀><<==Θ ∑

=

θθθθθ

Exponential model

29

Page 30: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

30

Statistical functional

A functional f (p) is called a statistical functional if p(x) is a pdf.

A statistical functional f (p) is said to be Fisher-consistent for a model

,)()1( Θ∈pf

)()()2( Θ∈∀= θθf θp

if f (p) satisfies that}:)({ Θ∈= θxθpM

For a normal model }RIRI),(},2

)(exp{2

1)({ 22

2

2+×∈=

−−== σμ

σμ

πσθxθ

xpM

Tdxxxpdxxdxxxpp )))((,)(()( 22∫ ∫∫ −=f

is Fisher-consistent for θ = (μ, σ 2 ).

Example.

30

Page 31: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

31

TransversalityLet f (p) be Fisher consistent functional.

Is called a leaf transverse to M.

})(:{)( θffθ == ppL

}{)()1( θθ f pML =∩

)()2( fθLΘ∈

⊕θ

is a local neighborhood.

)( fθL

)(* fθL

θp

*θp

M

31

Page 32: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

32

Foliation structure

)( fP θLΘ∈

⊕=θ

))((),(),( fP θθθθLTvMTuTt ppp ∈∃∈∃∈∀

),(),(),(such that θxθxθx vut +=

))(()()( fP θθθθLTMTT ppp ⊕=

)( fθL

M

θp

),( θxt ),( θxu

),( θxv

Foliation

Decomposition of tanget spaces

Statistical model }:),()({ Θ∈== θθxxθ ppM )RI( p⊂Θ

32

Page 33: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

33

Transversality for MLE

})};()(exp{)(),({ 0)e( Θ∈−== θθxtθxθx κTppMExponential model

Hence the foliation associated with fML is a matched mean model.

})(:RI{ ∞<∈=Θ θθ κp

For the exponential model the MLE functional)e(M

)},({logmaxarg:)(ML θxfθ

pEp pΘ∈

=

}{ )}({)}({arg)( ),(ML xtxtf θθ

⋅Θ∈

== pp EEp

is written by

})}({)}({:{)( ),(ML xtxtf θθ ⋅== pp EEpL

Let p0(x) be a pdf and t (x) a p-dimensional statistic.

33

Page 34: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

34

Maximum likelihood foliation

),(),(),( KLKLKL baba rpDppDrpD θθ +=

)( MLfθL

θp

)e(M

1p 2p

3p

2r

1r)2,1,3,2,1( == ba

34

Page 35: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

35

Estimating function

),( θxu

Statistical model }:),()({ Θ∈== θθxxθ ppM )RI( p⊂Θ

p-variable function is unbiased

)(0)}),({(det,}),({ ),(),( Θ∈∀≠∂

∂= ⋅⋅ θ

θθxuθxu θθ pp EE 0

def⇔

The statistical functional }),({solvearg)( 0==Θ∈

θxufθ

pEp

is Fisher-consitent, and the leaf transverse to M

}),(:{)( 0== θxufθ pEpL

is m-geodesical.)( fθL

Θ∈⊕

θ

35

Page 36: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

36

Information Geometry

}:),({ Θ∈= θθxpMStatistical model

Information metric (Rao, 1945)

∫ = 1),(stpdfais),(where dxxpxp θθ

)},(),({)( ),( θθθ θ xexeEg jixpij =

Dual connection (Amari, 1982)

),(log),(where θθθ

xpxe ii ∂

∂=

)()( ),(,e

kjixpkij eeEθθθ

∂=Γ

)()()( ),(),(,m

kjixpkjixpkij eeeEeeE θθθθ +=Γ∂

Exponential connection

Mixture connection

36

Page 37: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

37

Mixture and exponential model

}:),({ Θ∈= θθxpM

Mixture model (mixture geodesic space)

}1,0:),...,{(,)(),(0

00

=>=Θ= ∑∑==

K

iiiK

K

iii xpxp θθθθθθ

}:),({ Θ∈= θθxpM

Exponential model (exponential geodesic space)

})(:{,)}()(exp{),(1

∞<=Θ−= ∑=

θθθ ψψθθK

iii xtxp

∫= dxxtii )}(exp{log)(where θψ θ

37

Page 38: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

38

Triangle with KL divergence

)10()()()( 1e≤≤= − sxqxrzxr ss

ssExponential geodesic

Let p, q and r in M

)10()()1()()(m≤≤−+= txqtxptxptmixture geodesic

∫= dxxqxpxpqpD)()(log)(),(KLKL divergence

}),(),(),(){1)(1( KLKLKL rqDqpDrpDst −−−−=

∫ −−= })log)(log{(emst rqqp

∫ −−−−= })log)(log{()1)(1( rqqpts

),(),(),(e

KLm

KLem

KL stst rqDqpDrpD −−p

q

r

mtp

esr

38

Page 39: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

39

Pythagorean Theorem

)(m xps

)(xp

)(xq

)(xr

)(e xr t

),(KL rpD

),(KL rqD

),(KL qpD

),(),(),(e

KLm

KLem

KL stst rqDqpDrpD +=

),(),(e

KLem

KL sst rqDrpD ≥

),(),( mKL

emKL qpDrpD tst ≥ (em algorithm, Amari, 1995)

39

Page 40: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

40

Minimum divergence geometryD : M×M → R is an information divergence on a statistical model M

(i) D(p,q) ≧ 0 with equality if and only if p = q

(ii) D is a differentialble on M×M

Then we get a Riemannian metric and dual connections on M (Eguchi, 1983,1992)

)|(),()( YXDYXg D −=

))(()|(),()( MXZZXYDZYg XD ∈∀−=∇

))(()|(),( *)( MXZXYZDZYg XD ∈∀−=∇

pqqqp qpDYXZpXYZD == |),())(|(

⇒⇐

Let

40

Page 41: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

41

)(),()|(),( )()( ZZYfgZXYfDZYg XD

XfD ∀∇=−=∇

)|()|()|(),()( XYDXYDYXDYXg D ⋅=⋅=−=

0)|],([)|()|(),(),( )()( =⋅=⋅−⋅=− YXDYXDXYDXYgYXg DD

)(Dg

sconnection dual areand *XX ∇∇

)(),)(()|)(()),(( )()( ZZYXfYfgZfYXDZfYg XD

XD ∀+∇=−=∇

is a Riemann metric

)( *21

XX ∇+∇=∇Xwhere

),(),(),(),(),( )()(*)()()( ZYgZYgZYgZYgZYXg XD

XD

XD

XDD ∇+∇=∇+∇=

Remarks

0)|( =⋅XD

41

Page 42: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

42

∫ −= ))}(()({),( qUqpqpLU ξξU cross-entropy

thatsuch),,( tripleaLet ξuU U is a convex function,

U entropy

1,' −== uUu ξ

),()(),( qpLpHqpD UUU −=U divergence

)( pξ )(qξ

))(( pU ξ

))(( qU ξ

U divergence

)}({maxarg)( tUtsst

−=∞<<∞−

ξ

))(()()(where * sUsssU ξξ −=

)}({maxarg)( * sUsttus

−=∞<<∞−

∫== )(),()( * pUppLpH UU

42

Page 43: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4343

Example of U divergence

∫∫ −=−−−= )log(log)}log(log{),(KL qpppqppqqpD

}{ )(1

11),( ∫ −

+

+−+−=

β

ββ

β

βββ

pqppqqpD

KL divergence

Beta (power) divergence

))log(),exp(),(exp())(),(),(( sttstutU =ξ

)( 1,)1(,)1())(),(),(( /11

/)1(

β

ββ

β

ββ

βββ ββξ −

++

=+

+ sttstutU

Note )exp()(lim0

ttU =→ ββ

),(),(lim KL0qpDqpD =

→ββ

)}()({))(())((),( pqppUqUqpDU ξξξξ −−−= ∫

43

Page 44: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4444

Geometric formula with DU

)|(),()( YXDYXg UU −=

))(()|(),()( MXZZXYDZYg UXU ∈∀−=∇

))(()|(),( *)( MXZXYZDZYg XD ∈∀−=∇

s.t.),,( )(*)()( UUUg ∇∇

dxxqxqg iiji

U )),((),(()( θξθθθθ∫ ∂

∂=)

dxxqxq kjikijU )),((),(( 2

,)( θξθθ

θθθ∫ ∂

∂∂

∂=)Γ

dxxqxq jikkijU )),((),((

2

,)(* θξθθ

θθθ ∂∂

∂∫=)Γ

2005)Eguchi,()(m

)( UU ∀∇=∇

)divergence KL(exp)( =⇒⇐= Ugg U

44

Page 45: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4545

Triangle with DU

])1,0[]1,0[),(()},(),(),(){1)(1(

),(),(),( )()()()(

×∈∀−−−−=

−−

tsrqDqpDrpDst

rqDqpDrpD

UUU

UsU

mtU

Us

mtU

p

q

r

)(mtp

)(Usr

)()()1()(m

xxx tqptpt +−=

)( ))(())(()1()()(s

Us qsrsur κξξ ++−= xxx

mixture geodesic

U geodesic

),(),(),(

}{}{)()()()(

)()(

UsU

mtU

Us

mtU

Usq

mt

rqDqpDrpD

rp

+=

⇒⊥

45

Page 46: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

46

Light and shadow of MLE

1.Invariance under data-transformations

2.Asymptotic efficiency

1. Non-robust

Sufficiency and efficiencylog exp

ξ u

2. Overfitting

Log-likelihood on exponential family

46

Page 47: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

1

Learning with Information Divergence

GeometryShinto Eguchi

andOsamu Komori

The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected]: http//www.ism.ac.jp/~eguchi/

Tutorial Workshop on

47

Page 48: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

2

Outline

9:30~10:30 Information divergence class and robust statistical methods I

11:00~12:00 Information divergence class and robust statistical methods II

13:30~14:30 Information geometry on model uncertainty

15:00~16:00 Boosting leaning algorithm and U-loss functions I

9:30~10:30 Boosting leaning algorithm and U-loss functions II

11:00~12:00 Pattern recognition from genome and omics data

48

Page 49: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

3

Information divergence class and

robust statistical methods II

49

Page 50: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4

Light and shadow of MLE

1.Invariance under data-transformations

2.Asymptotic efficiency

1. Non-robust

Sufficiency and efficiency

log exp

ξ u

2. Over-fitting

Log-likelihood on exponential family

Likelihood method

U-method

50

Page 51: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

5

))(()}({),( qUqEqpL pU ξξ −=

U-entropy

U-cross entropy

))(()}({)( pUpEpH pU ξξ −=

Let U(t) = exp(t) .

.1

)1()(Let

1

ββ β

β

++

=

+

ttU

)}({log)( XpEpH pU =

}1)({)(β

β −=

XpEpH pU

),,,( ΞξuUTake a quadruplet

U-entropy

Example 2.

Example 1.

51

Page 52: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

6

U-divergence

),()( qpLpH UU ≥Information inequality

),()(),( qpLpHqpD UUU −=U-divergence

),()( qpLpH UU −

)( pξ )(qξ

))(( pU ξ

))(( qU ξ

0)}()()){(())(())(( ≥−−−= pqpupUqU ξξξξξ

}log{log),(KL pqppqqpD −−−=

ββ

ββββ

β)(),(

1

11 qpppqqpD −−

−=

+

++

KL-divergence

β-power divergence

52

Page 53: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

7

Max U-entropy distribution

})}({:{ τXtΓτ == PEpEqual mean space

Let us fix a statistics t (x).

)(maxarg)(* pHp Up τΓ

x∈

=

))()(()( T* θxtθx κ−= up

)(})({

|)}1)1(())1)((())1(({

T*

0**T*

τ

ε

κξ

εελεεεεε

Γ∈∀+−=

−+−−+−−−+−∂∂

=

qpq

qpqpqpH

τtθ

Euler-Lagrange’s 1st variation

)()())((Hence T* θxtθx κξ −=p

U-model

53

Page 54: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

8

))(()}({),()( θθθθ qUqEqpLL pUU ξξ −==

U-estimate

U-loss function

)(minargˆ emp θθθ

UU LΘ∈

=U-estimate

)},(),({),(),(),( θXθXθxθxθxsθ

swEsw qU −=U-estimating function

.function score is),(,))((')(),( where θXsxxθx θθ qqw ξ=

))(())((1)(1

empθθ xθ qUq

nL

n

iiU ξξ +−= ∑

=U-empirical loss function

Let p(x) be a data density function with statistical model )(xθq

54

Page 55: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

9

Γ -minimax

})}({:{ τXtΓτ == PEpEqual mean space

),(maxmin)*,*(),(minmax qpLppLqpL UUUpqqp ττ

ττττ Γ∈Γ∈Γ∈Γ∈

==

∫ =−

−=

τxtθxt

xtθx

θ

θτ

))(()(

,))(()(* T

Twhere

κ

κ

u

up

)(maxarg* pHp Up τ

τΓ∈

=

τΓ

τ'Γ

UM

τΓ

'τΓ

}{τ

*τp

*'τp

Let us fix a statistics t (x).

55

Page 56: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

10

U-model

}:))(()({ Θ∈−== θxtθx θθ κTU uqMU-model

mean parameter is)}({ xtτθqE=U-estimator of ∑

=

=n

iiU n 1)(1ˆ xtτ

))(())((1)(1

empθθ xθ qUq

nL

n

iiU ξξ −= ∑

=

))((})({1 T

1

Tθθ xtθxtθ κκ −−−= ∑

=

Un

n

ii

0 )}({)(1)(1

emp =−= ∑=

∂∂ Xtxtθ

θθ q

n

iiU E

nL

τ'Γ

UM

τΓ

'τΓ

*τp

*'τpτΓ

UM

).,()ˆ()( ˆempemp

θθθθ qqDLLU

UUUU =−

We observe that

which implies

Furthermore,

56

Page 57: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

11

U-Boost density learning

)2())1((minarg),( 1]1,0[),(

TkL kUkk ≤≤+−= −×∈

πφξπφπφπ D

},...,1:)({ )( Mjj == xD φDictionary of ξ-densities

)(minarg11 φφξφ

UD

L∈

==

)()()1()( 1 xxx kkkkk φπξπξ +−= −U-boost

)(minarg*)cov(

ξξξ

ULD∈

=Goal is to find

U(t) = t 2 (Klemela, ML, 2006) U(t)=exp(t) (Friedman et al., JASA,1998)

57

Page 58: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

12

Proposed density estimator

))()(()( T θxtθx kuf −=

)1()1(with),...,( 11 TkkkT πππθθθ −−== +θ

T1 ))(,),(()( xxxt Tφφ=

U-estimate under U-model

1−kξ

kξ kφ

Nj

jD 1)( }{ == φ

)cov( D

58

Page 59: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

13

1−kξ

kξ kφ

1+kφ

1+kξ

59

Page 60: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

14

1−kξ

kξ kφ

1+kφ

1+kξ2+kξ

2+kφ

∞→→ kk as*ξξ

60

Page 61: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

15

Statistical machine learning

Brain function

LearningData sets

Signal processingPattern recognition

Vapnik (1995)

Hastie, R. Tibishirani, J. Friedman (2001)

・・・・

61

Page 62: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

16

))}(()({)( θθθ qUqpLU ξξ −= ∫

Minimum U divergence

U-loss function

)(minargˆ emp θθθ

UU LΘ∈

=MinimumU-estimator

)},(),({),(),(),( θXθXθxθxθxsθ

swEsw qU −=U-estimating function

.function score is),(,))((')(),( where θXsxxθx θθ qqw ξ=

∫∑ −==

))(())((1)(1

empθθ xθ qUq

nL

n

iiU ξξThe empirical loss

Let p(x) be a data density function with statistical model )(xθq

62

Page 63: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

17

• Huber’s M-estimator ∑=

Θ∈

n

iix

n 1

),(1min θρθ

⎩⎨⎧

−<−−

=otherwise)sgn(

||if),(

θθθ

θρyk

kyyy

As M-estimation

• location case

• relation ∫−= ))(())((),( θθ qUxqx ξξρ θ

63

Page 64: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

18

Influence function

Statistical functional

})),((()()),(({minarg)( ∫∫ +−=Θ∈

dxxpUxdGxpGTU θξθξθ

0

)(),(IF=

⎥⎦⎤

⎢⎣⎡∂∂

≡ε

εεGTxT

xFG δεε θε +−= )1(

)}],(),({),(),()[(),(IF 1 θθθθθ θ xSxwExSxwJxT −= −ΨΨ

Influence function

||),(IF||sup)(GES xTTx

ΨΨ =

64

Page 65: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

19

Efficiency

Asymptotic variance

))()()(,0()ˆ( 11 −−⇒− θθθθθ UUUD

U JHJNn

)),(),(()(

),),(),(),(()(

θθθ

θθθθ

XSXwVarH

XSXSXwEJ

U

TU

=

=

111 )()()()( −−− ≤ θθθθ UUU JHJI

Information inequality

where

( equality holds iff U(x) = exp(x) )

65

Page 66: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

20

Normal mean

-6 -4 -2 2 4 6

-6

-4

-2

2

4

6

0.125

0.1

0.075

0.05

0.01

0

-6 -4 -2 2 4 6

-6

-4

-2

2

4

6

2.5

0.8

0.4

0.15

0.015

0

β-power estimates η-sigmoid estimates

Influence functionβ= 0

= 0.015= 0.15= 0.4= 0.8= 2.5

η = 0= 0.01= 0.05= 0.075= 0.1= 0.125

66

Page 67: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

21

Gross Error Sensitivity

β-power estimates η-sigmoid estimates

0.1970.6942.52.040.6890.1250.4550.7530.82.210.7420.10.6780.8020.42.470.7990.0751.040.8730.152.920.8610.051.90.9720.0156.160.970.01∞10∞10

GES効率ηGES効率β

67

Page 68: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

22

Multivariate normal

pdf is

⎭⎬⎫

⎩⎨⎧ −Σ−Σ=Σ

−−− 2

)()(exp)det)2((),,(1

21 μμπμ yyyf

Tp

尤度方程式 Ψ

),(

)(}))(({)(1

0)()(1

Σ=

⎪⎩

⎪⎨

Σ=Σ−−−

=−

μθ

θμμθψ

μθψ

ψcxxn

xn

Tjjj

jj

68

Page 69: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

23

( ) ( )111 , , +++ ∑=→∑= kkkkkk μθμθ

∑∑=+ ),(

),(1

ki

jkik xw

xxwθ

θμ

( ){ },

det ),(

)()(),(1

Σ−

−−=Σ +

kUki

Tkikiki

k

cxw

xxxw

θ

μμθ

繰り返し重み付け平均と分散

Under a mild condition

),...1 ( )(L )(L 1 =∀≥+ kkk θθ ψψ

U-algorithm

69

Page 70: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

24

Simulation

ε-conmination

)ˆ( 0KL θ,θD

, N , N-εG

, -

N .

. , N-εG

ε

ε

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=

9999

00

1001

00

)1(

1004

55

250501

00

)1(

(2)

(1)

ε

ε

ε

ε

(2)

(1)

under 6516702under2439033

05.00

G . . G . .

εε

=→=KL error

with MLE θ̂

70

Page 71: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

25

β-power estimates v.s. η-sigmoid estimates

31.640.3012.400.208.910.10

20.930.0535.300.0139.240

6.040.0014.640.000754.460.00056.700.00025

23.040.000139.240

βηKL error KL error

29.680.3012.200.205.140.106.650.05

13.910.0116.50

3.070.0033.040.0023.90.0013.190.000753.360.000516.50

(1)εG

)2(εG

71

Page 72: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

26

Plot of MLEs (no outliers)

11σ12σ

22σ

100 replications with100 size of sampleunder Normal dis.

η- MLE (η=0.0025)β- MLE (β=0.1)MLEtrue (1,0,1)

⎥⎦

⎤⎢⎣

⎡=Σ

2212

1211

σσσσ

72

Page 73: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

27

Plot of MLEs with ouliers

η- MLE (η=0.0025)β- MLE (β=0.1)MLEtrue (1,0,1)

)2(εG

100 replications with100 size of sampleunder

11.5

2

2.5

0

0.5

1

1.5

0.5

1

1.5

2

11.5

2

2.5

0

0.5

1

1.5

73

Page 74: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

28

Selection for tuning parameter

Squared loss function

dyygyf 221 )}()ˆ,({)ˆ(Loss −= ∫ θθ

∫∑ +−= −

=

dyyfxfn

ii

n

i

2)(

1

)ˆ,(21)ˆ,(1)(CV ββ θθβ

)(CVminargˆ βββ

=

)ˆ,(IF1

1ˆˆ )(βββ θθθ i

i xn −

+≈−Approximate

74

Page 75: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

29

Selection for tuning parameter

∫∑ +−≅=

dyyfxfn

V i

n

i

2

1)ˆ,(

21)ˆ,(1)(C ββ θθβ

θθ

θ ββ ∂

∂−

+)ˆ,(

)ˆ(1

1,

iTi

xfxIF

n

The third term is dominant in no outlier case,in which CV(β) has a minimum around at β = 0.When there are substantial outliers, the first and second terms are dominant and has a minimum around at β = 1Cf. GIC in Konishi and Kitagawa (1996)

75

Page 76: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

30

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛.26.1-

.1-.26 varianceand

00

mean with Normal

0.0250.050.0750.10.1250.150.175

2.82

2.84

2.86

2.88

2.9

2.92

2.94 ⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛.261.126-.126-.228

0.081-

0.054

signalMLE

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛.383.263-.263-1.059

0.184-

0.204

ContamiMLE

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛.286.134-.134-.293

0.132-

0.086 MLE-β

β

)(C βV

07.0ˆ=β

76

Page 77: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

31

What is ICA?

Cocktail party effect

Blind source separation

s1 s2 ……… sm

x1 x2 ……… xm

77

Page 78: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

32

ICA model

sWxW mmm 1 s.t., −× =∈∈ RR μ

(Independent signals)

(Linear mixture of signals)

0)(,,0)()()()(),...,(

1

111

====

m

mmm

SESEspspspsss ~

),,( 1 nxx

))(())((|)det(|),,( 111 mmmppWpWf μxwμxwx −−=

Aim is to learn W from a dataset

in which unknown. is)()()( 11 mm spspp =s

78

Page 79: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

33

ICA likelihood

Log-likelihood function

|)det(|log))((log)(11

WpW jijj

m

j

n

i+−= ∑∑

==

μxw

TTm WWxWxhI

WpWxpWxF −−=

∂∂

= ))()((),,(),,(

)( )(log,,)(log)(1

11

m

mm

ssp

sspsh

∂∂

∂∂=

Estimating equation

Natural gradient algorithm, Amari et al. (1996)

79

Page 80: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

34

Beta-ICA

β power equation ),(),,(),,(11

μμμ ββ WBWxFWxf

n

n

iii =∑

=

decompsability:

0]))}({E[

)]()}({E[

])}({E[,

=−×

−−×

−∏≠≠

XwXwp

XwhXwp

Xwp

tttt

ssssss

tqsqqq

β

β

β

μ

μμ

μ

ts ≠∀

80

Page 81: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

35

Likelihood ICA

0.5 1 1.5 2 2.5

0.2

0.4

0.6

0.8

1

1.2

1.4

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

⎥⎦

⎤⎢⎣

⎡=−

5.01211W

150 signals U(0,1)× U(0,1)

Maximum likelihood

Mixture matrix

81

Page 82: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

36

Non-robustness

-2 -1 1 2 3

-2

-1

1

2

-2 -1 1

-2

-1

1

2

50 Gaussian noise N(0, 1)× N(0, 1)

Maximum likelihood

82

Page 83: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

37

β power-ICA (β=0.2)

-2 -1 1 2 3

-2

-1

1

2

-10 -7.5 -5 -2.5 2.5 5 7.5

-6

-4

-2

2

4

Minimum β-power

83

Page 84: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

38

U-PCA

)()(exp)()),exp(log()( * zzzzz Ψ−=Ψ+=Ψ ηη

∑=

−=n

iiU xrUL

1))),((log()( γμγ

2

22

||||)(||||),(

γγγ yyyr

T

−=

η-sigmoid

)},(min{minargˆ μγγμγ

UU L=

γγγγγ

γγ T

T

iSSxxr max)(tr),(min −=−∑Classical PCA

ryγ

84

Page 85: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

39

U-PCA

),(into),(Update ** γμγμ

Tiii xxxw

xwixr

xr

))((),(),S(

),,())((

))((

μμγμμγ

γμγμψ

γμψ

−−−=

∑=

,−

,−where

⎪⎩

⎪⎨

=

=

∑=

i

n

ii x,,xw

,S

)(

)(

1

*

***

μγμ

γλγμγ

85

Page 86: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

40

Non-robustness in Classical PCA

-4 -2 2 4 6

-3

-2

-1

1

2

3

-4 -2 2 4 6

-3

-2

-1

1

2

3

-10-5

0

5

10

-10

-5

0

510

-10

-5

0

5

10

-10-5

0

5

10

-10

-5

0

510

MLE for Pc vector = (.55, .82, .01, .07, .01, .032, .10)

MLE for Pc vector = (.00, .01, .05, .04, .02, .99, .00)

86

Page 87: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

41

Data-weights in U-PCA

20 40 60 80 100 120 140

-0.2

0.2

0.4

0.6

0.8

1

U-w

eight

Pc vector = (.55, .82, .01, .07, .01, .032, .10)

U-estimator for Pc vector = (. 64, .75, .01, .09, .01, .03, .09)

87

Page 88: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

42

2.5 5 7.5 10 12.5 15

0.2

0.4

0.6

0.8

1

radius r

u(r)

1

Tube neighborhood in U-PCA

Ψγ̂

)−− ΨΨ γμx ˆ,ˆ( izix

88

Page 89: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

43

Kernel methods

KMs : mapping the data into a high dimensional feature space

Use of kernel functions for vectors, sequence data, text, images.

SVM, Fisher's LDA, PCA, ICA, CCA, SIR, spectral clustering

(kernel trick, cf. Aizeman et al. 1964)

(Any kernel can be used with any kernel-algorithm)

Kernel-Machines Org (http://agbs.kyb.tuebingen.mpg.de/km/bb/)

SVM Org (http://www.support-vector-machines.org/)

(Spline method, cf. Wahba. 1979)

89

Page 90: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

44

RKHS

ZX →Φ:

X : data space in

Z : high dimensional feature space

pRI

: kernel spectrum wrt )()(),( uxuxK Φ⋅Φ=

(kernel trick, cf. Aizeman et al. 1964)

: feature map HxKhx x ∈⋅= ),(

)(),,( xffxK H =⋅

ZXh →:

22 ||||),(||)(|| HxZ hxxKx ==Φ

90

Page 91: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

45

Kernel data

-1

-0.5

0

0.5

1-1

-0.5

0

0.5

1

0

0.25

0.5

0.75

1

-1

-0.5

0

0.5

1

}30,...,1:{ =ixi }30,...,1:{ =ihix

91

Page 92: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

46

Ψ-loss function

Let p be a true density function on H and m a proto model function.

Definition 1. U-loss function is

∫+−== τξξθ θθθ dmUmEmpCL PUU ))(()(),()(

We assume an exponential model defined by

)),()(exp()( T θκθθ −= hxhm

If we rewrite Ψ-loss function is

.1))(exp(satisfieswhere T∫ =− τκθκ θθ dhx

then)),(exp()( tt ξ−=Ψ

∫ −Ψ−+−Ψ= τκθκθθ θθ dhxUhxEL TTPU )))((()})(({)(

92

Page 93: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

47

KPCA Cf. Scholkopf et al 1998 NC

Robust PCA

Wang, Karhunen, Oja (NN 1995)

Higuchi, Eguchi (NC 1998; JMLR 2004)

Xu, Yuille (IEEE NN 1995)

Hubert, Rousseeuw, Branden (Technometrics, 2005)

Robust KPCA

93

Page 94: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

48

U-KPCA

∑=∈Γ∈

ΨΨ ΓΨ=Γn

iiOHm

mhzm xk 1,

)),,((minarg)ˆ,ˆ(

Ψ-Kernel Principal Component Analysis Cf. Haung, Yi-Ren , Eguchi (2008)

where Ψ(z) is strictly increasing function and

Remark Ψ0-KPCA = KPCA if

Example )}exp(1{)( 11 zz ββ −−=Ψ −

))(exp(1)exp(1log)( 1

2 zz

−++

=Ψ −

ηββηβ

zz =Ψ )(0

)()(lim 010zz Ψ=Ψ

→βη

β−Ψ=Ψ

→)()(lim 020

zz

2 4 6 8 10

2

4

6

8

10)(1 zΨ

)()(lim 02 zz Ψ=Ψ∞→η

1.0=β5.0=β1=β

0=β

}||)(||||{||21),,( 2*2

HxHxx mhmhmhz −Γ−−=Γ

94

Page 95: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

49

Toy example95

Page 96: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

50

Functional data

5 phonemes (TIMIT database) http://www.lsp.ups-tlse.fr/staph/npfda/

sh iy aadcl ao

she she dark dark water

}2000,...,1:),{( == iyfD ii 5% contamination

)15,...,1,2000,...,1(,)()(~==+= jiutftf ijijjiji δ

)15,10(Uniform~,)05.0(Bernoulli~ ijij uδ

96

Page 97: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

51

FPCA97

Page 98: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

1

Learning with Information Divergence

GeometryShinto Eguchi

andOsamu Komori

The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected]: http//www.ism.ac.jp/~eguchi/

Tutorial Workshop on

98

Page 99: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

2

Outline

9:30~10:30 Information divergence class and robust statistical methods I

11:00~12:00 Information divergence class and robust statistical methods II

13:30~14:30 Information geometry on model uncertainty

15:00~16:00 Boosting leaning algorithm and U-loss functions I

9:30~10:30 Boosting leaning algorithm and U-loss functions II

11:00~12:00 Pattern recognition from genome and omics data

99

Page 100: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

3

Information geometry on model uncertainty

100

Page 101: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4

Observational bias

The theory for statistical inference is formulated under an assumption of random mechanism, for example, random sample.

However, the assumption is frequently in-testable in a situation that involves observational studies. In this sense we have to make a sufficientlycautious inference.

Typically missing data comes from a variety of missing mechanism such asmissing at completely at random, missing at random, missing not at random.In particular, missing not at random bring about a serious bias in the inference.

101

Page 102: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

5

Hidden Bias

Publication bias -not all studies are reviewed

Confounding -causal effect only partly explained

Measurement error -errors in measure of exposure

102

Page 103: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

6

Lung cancer & passive smoking

1.00.50.3 1.5 2.0 3.0 4.0 5.0 10.0

stud

y

Odds ratio

510

1520

2530

103

Page 104: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

7

Passive smoke and lung cancer

Log relative risk estimates (j =1,…,30) from 30 2x2tables jθ

)weighte varinancinverse theis( jj

jj ww

w

∑∑=

θθ

The estimated relative risk 1.24 with 95% confidence interval (1.13, 1.36)

104

Page 105: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

Conventional analysis

1.00.50.3 1.5 2.0 3.0 4.0 5.0 10.0

stud

y

Odds ratio

510

1520

2530

1.24

105

Page 106: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

9

Incomplete Data

z = (data on all studies, selection indicators)

y = (data on selected studies)

z = (response, treatment, potential confounders)

y = (response, treatment)

z = (disease status, true exposure, error)

y = (disease status, observed exposure)

y = h(z)

106

Page 107: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

10

Level Sets of h(z)

1. One-to-one 2. Missing 3. Measurement error

4. Interval censor 5. Competing risk 6.Hidden confounder

107

Page 108: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

11

Tubular Neighborhood

MεM

})),,((KLmin:),,({ 221 εεε ≤⋅=

∈ YYY gfg θθyΘθ

N

}:),({ Θ∈= θθyYfMModel

Copas, Eguchi (2001)

2

2

),(KL εε ≤MM

Near-model }:),,({ Θ∈= θθyY εε gM

108

Page 109: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

12

Mis-specification

}{ ),(exp),(),,( θzθzθz ZZZ ufg εε =

0=== )(E,1)(,0)(E 2ZZZZ suuEu fff

21

}),KL(2{ ZZ fg=ε

"direction cationmisspecifi"=Zu

109

Page 110: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

13

Near model

}{ ),(exp),(),,( θyθyθy YYY ufg εε =

],|),([E),(where yθzθy ZθY uu =

)(:By zhyzh =

h),( θzZf ),( θyYfModel

Near-modelh

),,( εθzZg ),,( εθyYg

110

Page 111: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

14

Ignorable incompleteness

Let Y = h(Z) be a many-to-one mapping.

If Z has then Y has

∫ −=

)(1d),(),(

yhZY zzθy θff

Z is complete; Y is incomplete

),( θzZf

ZθZ onData ˆMLE ←

YθY onData ˆMLE ←

)ˆE()ˆ(E trueis ZYZ θθ =⇒f

)ˆE()ˆ(E wrongis ZYZ θθ ≠⇒f

111

Page 112: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

15

bias),,,( from },...,{ˆMLE virtual 1 ZZZ θzzzθ ugn ε←

),,,(from},...,{ˆMLEactual 1 YYY θyyyθ ugn ε←

)()|(|),()ˆE()ˆ(E 12 −++=− nOOu εε θbθθ ZZY

)(( ,cov),cov),( 11YYYZZZYZZ ssbbθb uGuGu −− −=−=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⎯⎯ →⎯⎟

⎟⎠

⎞⎜⎜⎝

−−−−

−−

−−

11

11Dist 1,ˆ

ˆ

YY

YZ

YY

ZZ

bθθbθθ

GGGG

nNn00

εε

bias

Limit distribution

112

Page 113: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

16

Asymtotic bias

)1( ||||max min22

|λε

ε−=b

Zu

ifonly and if holds bounds The

.),(),( θysθy YY ∈u

)ˆE()ˆ(E ZY θθb −=

loss ninformatio of eigenvaluesmallest theisminλ

2/12/1 −−=Λ YZY III

113

Page 114: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

17

Problem in estimation of bias

The nonignorable model

}{ 22 )(),(exp),(),,( 21 ερεε θθyθyθy YYY −= ufg

gives the worst case if ).,(),( T θysωθy YY =u

However is inestimable and untestable: ω

The profile likelihood ∑=

∈=

n

iigPL

1

)},,,({logmax),( εε ωθyω YΘθY

is flat at 0=ω

114

Page 115: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

18

Heckman model for MNAR

))(()1( T1T, xβxψX|R −+Φ== − yrg Y σωεσ+= xβTy

),,()(),,( )( rthrt r xzxz ==

ω

115

Page 116: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

19

From pure misspecification

Unbiased perturbed biased perturbedh

116

Page 117: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

20

The worst case for bias

),(),(),( T2 θbθbθ ZYZZ uGuu =β

)( 11,cov),( YYZZZZ ssθb −− −= GGuu

ddddθθ

YZYY

YZYYZYYZZ GGGG

GGGGGGGuu)(

)()(),(),( 11T

1111T*22

−−

−−−−

−−−

=≤ ββ

dd

szhsdz

ZY

ZZYYZ

)(

))(()(iff attainsBound

11T

11T* }{

−−

−−

−=

GG

GGu

Cauchy-Schwartz’s inequality

)()(

)(11T

11T* )(

ysdd

dy Y

ZY

ZYY −−

−−

−=

GG

GGu εM

M

117

Page 118: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

21

The worst case

),,,( εωθyYg

),( θyYf

),( ωθy YY If ε+ ).,(),( T θysωθy YY =u

*)),(),,,,(KL(minarg*

θωθωθ YYΘθ

Y ⋅⋅=+∈

fgI εε

118

Page 119: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

22

Sensitivity analysis

}{ TT

21),(exp),(),,( ωωθysωθyωθy YYY YIfg −=

θθysωθysωθy

θY

YY ∂∂

+=∂∂ − ),(),()},,(log{ 2/1T

YIg

The most sensitive model

Estimating function of θ with fixed ε, ω

Yθ̂}const.:ˆ{ T

, =ωωθ Yω IY

119

Page 120: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

23

Behavior of two MLEs

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

−−−

⎟⎟⎠

⎞⎜⎜⎝

⎛→⎟

⎟⎠

⎞⎜⎜⎝

−−

−−−−

−−−

1111

111

,,,ˆˆ

ˆ

ZYZY

ZYY

ZY

Y

bb

θθθθ

IIIIIIInNn

The two MLEs and are asymptotically normal asYθ̂ Zθ̂

⎟⎠⎞

⎜⎝⎛→=−− −11,)ˆˆ(|)ˆ( Zuuθθθθ ZYY InN

D

Note:this asymptotic expression is valid only when )( 2/1−= nOε

Note: The conditioning cancels selection bias.

120

Page 121: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

24

Scenarios A, B, C

Inference from using fYnyy ...,,1

}.)()ˆ()ˆ(:{)(C 2αrkIk T ≤−−= YYY θθθθθ

Scenario A: 10 =⇒= Akε

Scenario C: 1unknown0 >⇒> Ckε

Scenario B:acceptable

n

found had and

,..., observed had weif,0 1 zz>ε

CBA kkk <<⇒

121

Page 122: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

25

Scenarios A and C

),0(~)ˆ(2/1 INI fYθθYY −

}.)()ˆ()ˆ(:{)(C 2αrkIk T ≤−−= YYY θθθθθ

Scenario A: ⇒=0ε

Scenario C:),0(~)ˆ(

unknown02/1 INI gY

bθθYY ε

ε

−−

⇒>

?!)(,1 AA kCk =

?!)(,1 22CC kCk κε+=

122

Page 123: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

26

Scenario B

andˆ MLEhave couldwe,,..., observe could weIf 1 Zθzz n

),0(~)ˆˆ()( 2/12/1 INII fYZYY θθU −Λ−= −

),0(~|)(* INgYUSUS =

)}ˆˆ()ˆ{()ˆ( 2/12/1ZYYZZZ θθθθθθS −−−=−= II

Conditional confidence interval

}||)(||:{)( 22*αrC ≤= uSθu

123

Page 124: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

27

ノンランダムネスの仮定

),(~)ˆˆ( 11 −− −− ZYZY Yθθ IINn f 0

),(~)ˆˆ(|)ˆ( 1−=−− ZZYY uuθθθθY

INg

なので,条件付信頼領域は

})ˆ()ˆ(:{)( 21αα rIC T ≤−−−−= − uθθuθθθu YZY

})(:{ 2111αα rIIB T ≤−= −−− uuu ZY

なので,仮説 H:バイアス b = 0 の検定領域は

仮説 H:バイアス b = 0 が水準 α で採択される下での

条件付信頼領域の和集合は,

∪α

αB

CC∈

=u

u)(

)(uαC

αB

C

124

Page 125: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

28

Theorem

}.)()ˆ()ˆ(:{)(CLet 2αrkIk T ≤−−= YYY θθθθθ

).2()()1( Then22||||

CCCr

⊆⊂≤

uu α

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

125

Page 126: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

29

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-1 -0.5 0.5 1

-1

-0.5

0.5

1

)001.0,001.0(),( 21

=λλ

)5.0,5.0(),( 21 =λλ)1.0,1.0(),( 21 =λλ )9.0,1.0(),( 21 =λλ

s.t.),( if attainable is boundupper The21

jiji λλ ≤≤∃

αB

})ˆ()ˆ(var)ˆ(:{)( 21Tαα rrCR A ≤−−= − θθθθθθ

)2()( αα rCRCrCR ⊆⊆

1-dimensional case P = 5% → 0.3%

126

Page 127: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

30

-1 -0.5 0.5 1

-1

-0.5

0.5

1

Double-the-variance rule}:);({ Θ∈= θθyfMStatistical model

Random sample Mfn ∈);( ,...,iid

1 θyyy ~

α % confidence interval

})ˆ()ˆ(var)ˆ(:{)( 21Tαα rrCR A ≤−−= − θθθθθθ

)( αrCR )2( αrCR

M }{ εM

127

Page 128: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

31

Passive smoke and lung cancer

The estimated relative risk 1.24 with 95% confidence interval (1.13, 1.36)

Square root rule95% confidence interval (1.08, 1.41)

128

Page 129: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

32

Risk from passive smoke

129

Page 130: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

Root-2-rule

1.00.50.3 1.5 2.0 3.0 4.0 5.0 10.0

stud

y

Odds ratio

510

1520

2530

1.24

130

Page 131: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

34

Passive smoke and lung cancer

The estimated relative risk 1.24 with 95% confidence interval (1.13, 1.36)

Square root rule95% confidence interval (1.08, 1.41)

131

Page 132: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

35

Reference

Eguchi & Copas (1998) JRSSB (near-parametric)

Copas & Eguchi (2001) JRSSB (ε -perturbed model)

Copas & Eguchi (2005) JRSSB (discussions) (Double-the-variance rule)

Eguchi & Copas (2002) Biometrika (Kulback-Leibler divergence)

Henmi, Copas & Eguchi (2007) Biometrics (Meta analysis)

Henmi & Eguchi (2004) Biometrika (Propensity score)

Statistical Analysis With Missing Data. R. J. A. Little, D. B. Rubin. Wiley (2002)

S. Greenland. JRSSA (2005) (Multiple-bias modelling)

Copas & Eguchi (2010) JRSSB (statistical equivalent models)

132

Page 133: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

36

Present and Future

Does all this matter?

Statistics ( missing data, response bias, censoring)

Biostatistics (drop-outs, compliance)

Epidemiology ( confounding, measurement error)

Econometrics (identifiability, instruments)

Psychometrics (publication bias, SEM)

causality, counter-factuals, ...

133

Page 134: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

1

Learning with

Information Divergence Geometry

Shinto Eguchiand

Osamu Komori

The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected]: http//www.ism.ac.jp/~eguchi/

Tutorial Workshop on

134

Page 135: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

2

Outline

9:30~10:30 Information divergence class and robust statistical methods I

11:00~12:00 Information divergence class and robust statistical methods II

13:30~14:30 Information geometry on model uncertainty

15:00~16:00 Boosting leaning algorithm and U-loss functions I

9:30~10:30 Boosting leaning algorithm and U-loss functions II

11:00~12:00 Pattern recognition from genome and omics data

135

Page 136: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

3

Boosting leaning algorithm

and U-loss functions I

136

Page 137: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4

Which chameleon wins?

137

Page 138: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

5

Pattern recognition…

138

Page 139: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

6

What is pattern recognition?

□ There are a lot of examples for pattern recognition.

□ In principle pattern recognition is a prediction problem for class label

which would classify a human interestingness and importance.

□ Originally human brain wants to label phenomena a few words , for

example (good, bad), (yes, no), (dead, alive), (success, failure), (effective,

no effect)….

□ Brain intrinsically predicts the class label from empirical evidence.

139

Page 140: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

7

Practice

● Character recognitionvoice recognition

Image recognition

☆ Credit scoring

☆ Medical screening

☆ Default prediction

☆ Weather forcast

● face recognition

finger print recognition●

speaker recognition●

☆ Treatment effect

☆ Failure prediction☆ Infectious desease

☆ Drug response

140

Page 141: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

8

Feature vector & class label

),...,( 1 pxx=xFeature vector

Class label y

Feature space pR⊆X

},,1{ GLabel set

Training data

Test data

}...,,1:),({train niyD ii == x

}...,,1:),({ testtesttest mjyD jj == x

141

Page 142: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

9

Classification rule

),...,( 1 pxx=x },,{ Gy 1∈

zyF →),(: x

Classification rule ),(maxarg)(},,1{

yFhGy

F xx∈

=

Feature vector Class label

yh →x:classifier

Discriminant function (score)

3)( =xFh

1 2 3 4 5

)1,( xF)2,( xF

)3,( xF

)4,( xF)5,( xF

)( R∈z

Consider a case of G = 5 for a given x

142

Page 143: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

10

Binary classification

),...,( 1 pxx=x }1,1{ +−∈y

zF →x:

classifier )}(sgn{)( xx FhF =

Feature vector Class label

score

)1,()1,()( −−+= xxx FFF

In a binary class y = −1, 1 (G=2)

)1,()1,(1))((sgn)1,()1,(1))((sgn

−<+⇔−=−>+⇔=

xxxxxx

FFFFFF

),(maxarg))(sgn(}1,1{

yFFy

xx−∈

=

143

Page 144: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

11

Multi -class

),...,( 1 pxx=x },,{ Gy 1∈

zyF →),(: x

),(maxarg)(},,1{

yFhGy

F xx∈

=

yh →x:

3)( =xFh

1 2 3 4 5

)1,( xF)2,( xF

)3,( xF

)4,( xF)5,( xF

)( R∈z

G = 5,

Classifier

Feature vector Class label

Score function

Classifier by score

144

Page 145: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

12

Probability distribution

).,( vector random of pdf a be),(Let yyp xx

∫ ∑∈

=B Cy

ypCBp }d),({),( xx

∫= Xxx d),()( ypypMarginal density

Conditional density

∑∈

=},...,1{

),()(Gy

ypp xx

)(),()|(

xxx

pypyp =

)(),()|(

ypypyp xx =

)()|()()|(),( xxxx pypypypyp ==

)1|()1|(

)1,()1,(

−==

=−==

ypyp

ypyp

xx

xx

145

Page 146: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

13

Error ratepR∈x },,{ Gy 1∈

Classifier )( xFh

Feature vector ,class label

has error rate

))(Pr()(Err yhh FF ≠= x

∑∑=≠

==−====G

iF

jiFF iyihjyihh

1),)(Pr(1),)(Pr()(Err xx

Training error

}...,,1:),({for train niyD ii == x

}...,,1:),({for testtesttest mjyD jj == x

nyhih iiF

F})(:{#)(Err train ≠

=x

Test error

myhj

h jjFF

})(:{#)(Err

testtesttest ≠

=x

146

Page 147: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

14

False negative/positive

FNP )11)(Pr( )FN( +=−== y|hh FF x

)11)(pr( )FP( −=+== y|hh FF x

True Negative

True Positive False Positive

False Negative

1+=y 1−=y

1)( +=xFh

1)( −=xFh

FPR

)1pr()FP()1pr()FN( )Err( −=++== yhyhh FFF

147

Page 148: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

15

Bayes rule

Let p(y|x) be a conditional probability given x.

.)|1(

)|1(log)(Define 0 xxx

−==

=ypypF

The classifier leads to Bayes rule. ))(sgn()( 0Bayes xx Fh =

For any classifier h

)(Err)(Err Bayes hh ≤

Note: The optimal classifier is equivalent to the likelihood ratio.However, in practice p(y|x) is unknown, so we have tolearn hBayes(x) based on the training data set.

Theorem 1

148

Page 149: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

16

Discriminant space associated with Bayes classifier is given by

{ }{ })|1()|1(: }1)(:{

)|1()|1(: }1)(:{

BayesB

BayesB

xxRxxRx

xxRxxRx

−=<+=∈=−=∈=

−=≥+=∈=+=∈=−

+

ypyphR

ypyphRpp

pp

},{ −+ RR

Error rate for Bayes rule

In general, when a classifier h associates with spaces

)}(Err1{)}(Err1{

)|1()()|1()(

)|1()()|1()(

)|1()()|1()(

)|1()()|1()()(Err)(Err

Bayes

\\\\

\\\\

)()(

)()(

)()(

)()(

BB

BB

Bayes

hh

dyppdypp

dyppdypp

dyppdypp

dyppdypphh

RRRR

RRRRRRRR

RRRRRRRR

RRRR

BBBB

BBBB

−−−=

+=−+−=−=

+=−+−=−≥

−=−++=−=

−=−++=−=−

∫∫∫∫

∫∫∫∫

∫∫∫∫

∫∫∫∫

++−−

++++−−−−

++++−−−−

++−−

xxxxxx

xxxxxx

xxxxxx

xxxxxx

)(Err)(Err Bayes hh ≥

−B−−

BRR \ −− ∩ BRR

−R −BR

−− RRB \

149

Page 150: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

17

Multi-normal distribution

p-variate normal (Gaussian) distribution is defined by the pdf

)}()(21exp{

)det()2(1),,( 1

2/12/ μxVμxV

Vμx −−−= −Tpπ

ϕ

pdf a has),( that Asuume yx

}),...,1{,(),,()(),( GyVypyp py ∈∈= Rxxx μϕ 

- 4- 2

02

4- 4

- 2

0

2

4

00 . 2 5

0 . 50 . 7 5

1

- 4- 2

02

4

G = 2

(Assumption for equal variance matrix)

150

Page 151: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

18

Bayes classifier

01Bayes )|1()|1(log)( α+=

−=+=

= xαxxx T

ypypF  

)1()1(log)(

21,)( 1

111

110

1111 −=

+=+−=−= −

−−+

−+

−−+ yp

ypTTT μVμμVμVμμα α

We call this the Fisher linear discriminant function

=

=−

=

=+

−=

−==

+=

+== n

ii

n

iii

n

ii

n

iii

yI

yI

yI

yI

1

11

1

11

)1(

)1(ˆ,

)1(

)1(ˆ

∑∑==

=−−=n

ii

n

ii

Ti nn 11

1ˆ),ˆ()ˆ(1ˆ xμμxμxV

)(in Plug Bayes xF

151

Page 152: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

19

Boost learning

Boost by filter (Schapire, 1990)

Bagging, Arching (bootstrap)(Breiman, Friedman, Hasite)

AdaBoost (Schapire, Freund, Batrlett, Lee)

Weak learners can be combined into a strong leanrner?

Weak learner (classifier) = error rate is slightly less than .5

Strong learner (classifier) = error rate is slightly less than that Bayes calssifier

152

Page 153: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

20

Web-page on Boost

http://www.boosting.org/

http://www.fml.tuebingen.mpg.de/boosting.org/tutorials

R. Meir and G. Rätsch. An introduction to boosting and leveraginghttp://www.boosting.org/papers/MeiRae03.pdf

R.E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.http://www.boosting.org/papers/Sch99e.ps.gz

Robert Schapire’s home page:http://www.cs.princeton.edu/~schapire/Yoav Freund's home page :http://www1.cs.columbia.edu/~freund/

153

Page 154: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

21

Set of weak learners

{ }RI},1,1{},,,1{:)(sgn),,(stamp ∈+−∈∈−== bapjbxabaf jj xFDecision stumps

{ }1010

T1linear RI),(:)(sgn),( +∈=+== pf ββ ββxββxF

Linear classifiers

Neural net SVM k-nearest neighbur

Note: not strong but a variety of characters

linearstamp FF ⊆

154

Page 155: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

22

Exponential loss function

Empirical exponential loss function for a score function F(x)

)}(exp{1)(1

exp ii

n

i

D Fyn

FL x−= ∑=

where q(y|x) is the conditional distribution given x, q(x) is the pdf of x.

Let be a training (example) set.}...,,1:),({train niyD ii == x

xxxxX

d)()}|()}(exp{{)(}1,1{

expqyqyFFL

y

E ∫ ∑−+∈

−=

Expected exponential loss function for a score function F(x)

155

Page 156: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

23

Learning algorithm

)},( ),...,,{( 11 nn yy xx

)(,),1( 11 nww

)(,),1( 22 nww1ε

)()1( xf

∑=

T

1)( )(

ttt f xα

)(,),1( nww TT

)()2( xf

)()( xTf

Final learner ∑=

=T

tttT fF

1)( )()( xx α

156

Page 157: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

24

Learning curve

50 100 150 200 250

0.05

0.1

0.15

0.2

Iteration number

Training curve

157

Page 158: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

25

0)(),1()(:Intial.1 011

=== xFniiw n    

,)'(

)())((I)(∑∑ ≠=

iwiwfyf

t

tiit xε

)(

)(121

)(

)(log)b(tt

ttt

f

f

ε

εα

−=

∑=

=T

tttTT fFF

1)( )()( where,)(sign.3 )( xxx α

Tt ,,1For .2 =

))(exp()()()c( )(1 iitttt yfiwiw xα−=+

)(min)()a( )( ff tftt εεF∈

=

Adaboost

158

Page 159: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

26

Update weight

21)( )(1 =+ tt fε The worst case

)()( 1 iwiw tt +→

       

      

t

t

eyf

eyf

iit

iit

α

α

−⇒=

⇒≠

Multiply )(

Multiply )(

)(

)(

x

x

update

Weighted error rate )()()( )1(1)(1)( +++ →→ tttttt fff εεε

159

Page 160: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

27

21)( )(1 =+ tt fε

∑∑+

+

=+ ≠=

)'()())(()(

1

1

1)(1 iw

iwyfIft

tn

iiittt xε

=

=

−≠= n

ititit

n

itititiit

iwfy

iwfyyfI

1)(

1)()(

)()}(exp{

)()}(exp{))((

x

xx

α

α

∑∑

==

=

=−+≠

≠= n

itiitt

n

itiitt

n

itiitt

iwyfIiwyfI

iwyfI

1)(

1)(

1)(

)())((}exp{)())((}exp{

)())((}exp{

xx

x

αα

α

21

)}(1{)(1

)()(

)()(1

)()(

)(1

)()(

)()(

)(

)(

)()(

)(

=

−−

+−

=

tttt

tttt

tt

tt

tttt

tt

ff

ff

ff

ff

f

εε

εε

εε

εεε

160

Page 161: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

28

Update in decision stumps

Wrong example 5 5 5 6 466 5 565

}|)(|21{minarg ∑ −=

iiij yfb

bx

Decision stump⎩⎨⎧

<−>+

=−=jj

jjjjjj bx

bxfffs

if1 if1

)(where)(or)()( xxxx

jbThe j-th feature vector njj xx ,...,1

jx

161

Page 162: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

29

Next step

Errror 数 45.5 7 9

87.56 798.5

Update foe weights:

jx

4.5

jbWeight up to 2

}|)(|)(21{minarg ∑ −=

iiijj ysiwb

bx

2log4

16log5.0ans. false of nb.ans.correct of nb.log5.01 ==⎥⎦

⎤⎢⎣⎡=α

Weight down to 0.5

jb

162

Page 163: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

30

Update in exponential loss

)}(exp{1)(1

exp ii

n

i

Fyn

FL x−= ∑=

)()()( xxx fFF α+→

}))(1()(){(exp fefeFL εε αα −+= −

Consider

[ ]∑=

− =+≠−=n

iiiiiii yfeyfeFy

n 1

))(I())(I()}(exp{1 xxx αα

∑=

−−=+n

iiiii fyFy

nfFL

1exp )}(exp{)}(exp{1)( xx αα

)(

)}(exp{))(()(

exp

1

FL

FyyfIf

n

iiiij∑

=

−≠=

xxεwhere

163

Page 164: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

31

Sequential optimization

}))(1()(){()( expexpαα εεα −−+=+ efefFLfFL

)()(1log

21

opt ff

εεα −

=

)}(1){(2 ff εε −≥

)}(1){(2)()(12

ffefe

f εεεε αα −+

⎭⎬⎫

⎩⎨⎧

−−

=

Inequality holds if and only if

αα εε −−+ efef ))(1()(

164

Page 165: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

32

AdaBooost = Seq-min exponential loss

)(minarg)a( )( ff tFf

t ε∈

=

)}(exp{)()((c) 1 itittt xfyiwiw α∝+

)}(1){()()(min )()(1exp)(1exp ttttt ffFLfFL εεαα

−=+ −−∈ R

)()(1

log21

)(

)(opt

t

t

ff

εε

α−

=

)(minarg)b( )(1exp ttt fFL ααα

+= −∈ R

165

Page 166: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

33

Proposal from machine learning

Learnability: boosting weak learners?

AdaBoost : Freund & Schapire (1997)

Weak learners (machines)

})(,....),({ 1 xx pff

Strong machine

)()( )()1(1 xx tt ff αα ++

)( xf

)()1(1 xfαForward stagewise

166

Page 167: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

34

Simulation (complete separable)

-1 -0.5 0.5 1

-1

-0.5

0.5

1

[-1,1]×[-1,1]

Feature space

Decision boundary

}1000,,1:),({ =iyiix

]1,1[]1,1[ −×−∈ix

)2sin( 12 xx π=

}1,1{ +−∈iy

167

Page 168: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

35

-1 -0.5 0.5 1

-1

-0.5

0.5

1

Set of linear classifiers

⎩⎨⎧

<++−≥+++

=++=0 if10 if1

)(sgn),(32211

322113221121 rxrxr

rxrxrrxrxrxxf

Linear classification machines

3321 )1,1(},,{ −Urrr ~

Random generation

168

Page 169: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

36

Learning process

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10

Iter = 23, train err = 0.10 Iter = 31, train err = 0.095 Iter = 47, train err = 0.08

169

Page 170: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

37

Learning process (II)

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 55, train err = 0.061

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 99, train err = 0.032

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 155, train err = 0.016

170

Page 171: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

38

Final decision boundary

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Contour of F(x) Sign(F(x))

171

Page 172: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

39

KL divergence

},,1{ G=Y

For conditinal distribution p(y|x), q(y|x) given x with commonmarginal density p(x) we can write

Nonnegative function

Feature space pR⊆X

),(),(),,( YXxxx ∈∈ yyym μ

∫ ∑=

+−=X

xxxxxx

G

yyym

yymymmD

1KL d)},(),(

),(),(log),({),( μ

μμ

Note

)()|(),(),()|(),( xxxxxx pyqypypym == μ

∫ ∑=

=X

xxxxx

G

yp

yqypypmD

1KL d)(}

)|()|(log)|({),( μ

Label set

Then,

172

Page 173: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

40

Twine of KL loss functionFor data distribution q(x, y) = q(x) q(y|x) we model as

∑=

−=G

gyFgFym

11 )},(),(exp{)|( xxx

xxxxxxx

Xd)()}|()|(

)|()|(log)|({),(

}1,1{KL qymyq

ymyqyqmqD

y∫ ∑

−+∈

+−=

Then

∑=

= G

ggF

yFym

1

2

)},(exp{

)},(exp{)|(x

xx

Log lossExp loss

173

Page 174: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

41

Bound for exponential loss

)}({exp(E)(exp XYFFL −=

))(exp()(1

1exp ∑

=

−=n

iiin FyFL xEmpirical exp loss

Expected exp loss

).(minarg

and functionsnt discrimina all of space a beLet

expopt FLFF F

F

∈=

Theorem.

.)|1()|1(log

21)(Then opt x

xx−=+=

=ypypF

174

Page 175: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

42

Variational calculas

⎥⎦⎤

⎢⎣⎡ −∂∂

=∂∂ ))(exp(E)(exp XYF

FFL

δ[ ]))(exp(E XYFY −=

[ ])|1())(exp()|1())(exp(E XXXX −=−+=−= ypFypF

}]{[)|1()|1())(2exp()|1())(exp(E

XXXXX

−=+=

−+=−=ypypFypF

.)|1()|1(log

21)(Hence opt x

xx−=+=

=ypypF

175

Page 176: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

43

On AdaBoostFDA, or Logistic regression

Parametric approach to Bayes classifier

01

101)( ααα +=+= ∑=

p

jjj xF xαx

AdaBoost ∑=

=T

ttt fF

11 )()( xx α

AdaBoost. real cf. ,classifier a is itself )(Each xtf

The stopping time T can be selected according to the state of learning.

Parametric approach to Bayes classifier, but dimension and basis function are flexible

176

Page 177: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

44

On AdaBoost (II)

1. Unbalanced examples

2. Onverleaning

EtaBoost

GroupBoost

AsymAdaBoost

LocalBoost

Robust against mislabel examples

Hi-dimension and small sample

Local learning

Balancing the false n/ps’

177

Page 178: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

45

Simulation (complete random)

-1 -0.5 0.5 1

-1

-0.5

0.5

1

178

Page 179: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

46

Overlearning of AdaBoost

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter = 51, train err = 0.21

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter = 151, train err = 0.06 Iter =301, train err = 0.0

179

Page 180: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

47

U-Boost

))(())((1)(1

empθθ xθ qUq

nL

n

iiU ξξ +−= ∑

=U-empirical loss function

∑∑∑= ==

+−=n

i

G

gi

n

iiiU gFU

nyF

nFL

1 11

emp )),((1),(1)( xx

In a context of classification

Unnormalized U-loss

∑∑= =

−=n

i

G

giiiU yFgFU

nFL

1 1

(0) )),(),((1)( xx

Normalized U-loss

∑∑∑= ==

+−=n

i

G

gi

n

iiiU gFU

nyF

nFL

1 11

(1) )),((1),(1)( xx

∑=

=G

ggFu

11)),(( subject to x

180

Page 181: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

48

U-Boost (binary)

Unnormalized U-loss

∑ ∑= ±=

−=n

i giiU FyU

nFL

1 1

(0) ))((1)( x

Note ))(()),(),((1

iig

iii FyUyFgFU xxx −=−∑±=

)}1,()1,({)(where21 −−= xxx FFF

Bayes risk consistency

)}|1())(()|1())(({)(

))(()( 1

xxxxx

xx

−=+=−∂∂

=−∂∂ ∑

±=ypFUypFU

FyFU

F y

))}(({minarg where)|1(

)|1())((

))(( **

*xyFUEF

ypyp

FuFu

F−=

−==

=− x

xx

x

F*(x) is Bayes risk consistent because 0)}({

)()()()()(

)(2 >

−−+−

=−∂

∂Fu

FuFuFuFuFu

FuF

181

Page 182: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

49

Eta-loss function

regularized

AdaBoost with margin

FFFU ηη +−= )exp()1()( generator with

182

Page 183: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

50

EtaBoost for Mislabels Expected Eta-loss function

)}())({exp()1()( xx yFyFEFL ηηη −−−=

Optimal score )(minarg* FLFF

η=

The variational argument leads to

ηη

ηη

2))(1(

)1()|()()(

)(

*21*

21

*21

++−

+−=

− xx

x

xFF

yF

ee

eyp

)()(

)(

**

*

1

1)(1

))(1(xx

xxx

FF

F

ee

e

++

+−= εε

ηη

ηε2))(-(1

)( where)()(

21

21

++=

− xxx

FFee

Mislabel modeling

183

Page 184: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

51

EtaBoost

0)(),1()(: settings Initial.1 011

=== xFniiw n    

,)())((I)(1

iwfyf m

n

iiim ∑

=

≠∝ xε

∑=

=T

tttTT fFF

1)( )()( where,)(sign.3 )( xxx α

Tm ,,1For .2 =

))(exp()()()c( )(*

1 iimmmm yfiwiw xα−∝+

)(min)()a( )( ff mfmm εε =

(b)

184

Page 185: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

52

A toy example

185

Page 186: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

53

Examples partly mislabeled

186

Page 187: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

54

AdaBoost vs. EtaBoost

AdaBoost EtaBoost

187

Page 188: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

55

EtaBoost

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter = 51, train err = 0.25 Iter = 51, train err = 0.15 Iter =351, train err = 0.18

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

188

Page 189: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

1

Learning with Information Divergence Geometry

Shinto Eguchiand

Osamu Komori

The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected]: http//www.ism.ac.jp/~eguchi/

Tutorial Workshop on

189

Page 190: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

2

Outline

9:30~10:30 Information divergence class and robust statistical methods I

11:00~12:00 Information divergence class and robust statistical methods II

13:30~14:30 Information geometry on model uncertainty

15:00~16:00 Boosting leaning algorithm and U-loss functions I

9:30~10:30 Boosting leaning algorithm and U-loss functions II

11:00~12:00 Pattern recognition from genome and omics data

190

Page 191: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

3

Boosting leaning algorithm and U-loss functions II

191

Page 192: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

4

GOAL

Statistical inference

Microarray

SNPs

Proteome

A variety of functions associated with genes

Statistical Learning

genetics

informatics

medicine

Modeling and Prediction forKnowledge and discovery

192

Page 193: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

5

Project for genome polymorphism analysis

mRNA

Protein

Genome

Microarray

SNPs

Proteome

[ Gene expression ]

[ Protein expression ]

[Single Nucleotide Polymorphism ]

193

Page 194: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

6

Project mission

Microarray

SNPs

Proteome

[ Gene expression ]

[ Protein expression ]

[Polymorphism ]

Drug effect / adverse effect

Disease

[ anticancer drug, aspirin ]

[Cancer, diabetes, cardiopathy … ]

Metabolic syndrome

biomarker

Genomic/protemic

Phenotype

194

Page 195: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

7

Expression Arrays and the p n Problem

T. Hastie, R. Tibshirani 20 November, 2003

Gene expression arrays typically have 50 to 100 samples and 5,000

to 20,000 variables (genes). There have been many attempts to adapt

statistical models for regression and classification to these data, and in

many cases these attempts have challenged the computational resources.......

>>

The Dantzig selector:Statistical estimation when p is much larger than n

E. Candes and T. Tao

Ann. Statist. 35, 6 (2007), 2313-2351.

195

Page 196: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

8

Recent issue

Fan and Lv (2008)Sure independence screening Dantzig selector

Candes, E. and Tao, T. (2007)

p is much larger than n

次元 1 100 1000 30000

196

Page 197: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

9

Unified view in machine learning

Dantzig selector (SVM, programming)

LASSO (LAR, Elastic net)

L2Boosting (Early stopping, εBoosting, Stagewize LASSO)

A tale of three cousins (Meinshausen, Rocha, Yu, 2007 )

L1 regularization

197

Page 198: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

10

What is genomic data?

mRNA

Protein

Genome

microarry

SNPs

proteome

[ 遺伝子発現 ]

[ 蛋白発現 ]

[ 一塩基多型 ]

多様性代謝

翻訳

02040

0204060

0204060

204060

255075100

10203040

255075

255075100

255075100

1500 1750 2000 2250 2500

198

Page 199: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

11

Target

Microarray

SNPs

Proteome

[ Gene expression ]

[ Protein expression ]

[Polymorphism ]

Drug effect / adverse effect

Disease

[ anticancer drug, aspirin ]

[Cancer, diabetes, cardiopathy … ]

biomarker phenotype

),( 1 pxx=x }1,1{ +−∈y

199

Page 200: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

12

GeneChip® HT Human Genome U133

AffymetrixのU133 contains probes more tha 54,000,Which is utilized semicondotor technology.

For one gene a set of 11 prob pairs are designed.All the probes consist of 25-mer DNA.

200

Page 201: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

13

Predection from microarrays

),( 1 pxx=x

}1,1{ +−∈y

Feature vector dimension p = the nb of geneseach component is of gene expression

Class label names of disease, effect of drugs, adverse effect of drug

training data }1:),({train niyD ii ≤≤= x

yf →x:ˆclassifier )(ˆˆ xfy =predictor

201

Page 202: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

14

Leukemic diseases Golub,T. et al. (1999) Science.

The first successful result202

Page 203: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

15

Web microarray data

7129

2000

7129p

242549Estrogen

224062Colon

353772ALLAMLy = -1y = +1n

p >> n

http://microarray.princeton.edu/oncology/http://mgm.duke.edu/genome/dna micro/work/

Open access data

×

203

Page 204: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

16

source("http://www.bioconductor.org/biocLite.R")

biocLite("GEOquery")

library(GEOquery)

d <- getGEO(file = "GSE2034_family.soft.gz")

http://www.ncbi.nlm.nih.gov/geo/

Gene Expression OmnibusNational Center for Biotechnology Information

204

Page 205: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

17

Clustering for successful case

205

Page 206: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

18

Jones, M. et al. Lancet 2004

Lung cancer

Pathological Classification

Prognosisidentification

Not successful case

206

Page 207: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

19

Prediction from mass-spectrometry

),( 1 pxx=x

Feature vector dimension p = peak numbers of molecular masscomponents express peak value

}1,1{ +−∈y

Class label names of disease, effect of drugs, adverse effect of drug

training data }1:),({train niyD ii ≤≤= x

yf →x:ˆclassifier )(ˆˆ xfy =predictor

207

Page 208: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

20

MW (Time of Flight)

LensLaser beam source

Mirror

Ion detector

High voltage

+++

+++

+++Protein ChipFlight tube (High vacuum)

Proteome method

Koichi Tanaka,MALDI-TOF MS,2002年Nobel Prize for chemistry

[SELDI TOF/MS]

208

Page 209: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

21

0

20

40

0

20

40

60

0

20

40

60

20

40

60

25

50

75

100

10

20

30

40

25

50

75

25

50

75

100

25

50

75

100

1500 1750 2000 2250 2500

Lungcancer

Normaltissue

Proteomic data財団法人癌研究会

209

Page 210: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

2222

Total data

Proteomic data

Fileterd by common peaks

AdaBoost learning

203人(130人(卵巣癌),73人cotrols)

Fushiki, Fujisawa, Eguchi, 2006

210

Page 211: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

23

Goal = Prediction score

Machine learning

Tarining data = (clinical data, genomic data)

Prediction score

Pattern recognition

Genom

ic data

Clinical data

{xi} {yi}

Sensitivity

x y

Train and test practical realization

F(x)

211

Page 212: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

24

Concordance among Gene-Expression–Based Predictors for Breast Cancer

Fan, et al NEJM 355:560-569, 2006

Prognosis prediction for breast cancer

70 gene file : van 't Veer J, et al. Nature 2002;415(6871):530-6.Recurrence score : Paik S, et al. NEJM 2004;351:2817-26.Mechanism-derived: Chang H Y, et al. PNAUS 2005;102(10):3738-43.Proper sutype : Sorlie T, et al. PNAUS 2001;98(19):10869-74.2 gene rate: Ma X J, et al. Cancer Cell 2004;5(6):607-16.

5 studies suggest different sets of genes that are related withprognosis for breast cancer

Four of five studies show substantial performance for new validation test data (295 samples)

212

Page 213: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

25

Nature 2002; 415: 530-6.

213

Page 214: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

26

Single variable study

y|x ),...,1(| pjyx j =

Let x be a feature vector from genomic monitor andy outcome value or phenotype.

)|( yp x )|( yxp j

Note: The multi-vaiate analysis for (x, y) is essential different froma set of all single varaiable analyses.Basically the genome data are correlated with in biologicalnetwork, which implies the information amount is not so huge even if the number of observed genes is larger than several ten thousands.

Joint analysis single analysis

214

Page 215: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

27

2 sample test approach

)),(,),,{( 11 nn yyD xx=Data set

is decomposed into

仮説 H:

),...,(}1:{00010 n

jjii

jj xxyx =−==x

),...,1(),...,(}1:{ 11111 pjxxyx njj

iijj ==+==x

},...,1:),{( 10 pjjj =xx

where n = n0 + n1

)1|()1|( −==+= yxpyxp jj

を考えよう.

2標本 , .01)}{(

nkk

jx =11)}{(

nii

jx =

215

Page 216: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

28

典型的な2群の検定量

Z スコア

スチューデント t 検定

ウィルコクソン検定

j

jjj

sxx

XZ 01)(ˆ −=

)2/())1()1(()(ˆ

200

211

01

−−+−

−=

nsnsn

xxXt

jj

jjj

)(1)(ˆ0

1 11

10

1 0

ij

n

k

n

ikjj xxI

nnXC ∑ ∑

= =>=

注意: Zスコアと t-検定は データ変換 x → ax+b (a > 0) に不変である.

ウィルコクソン検定は任意の単調変換 F に対して不変である.

)(ˆ))((ˆ jj XCXFC =

しかし異なる j と k に対して上の検定統計量は何の関係も持たない.

)(ˆ),(ˆ),(ˆ),(ˆ jkjkjk bXXaCXXCXCXC ++

×

(後半II部でこのことを再考する.)

216

Page 217: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

29

p 個の2群比較

p 個の2群比較のための検定統計量 を採用して

},...,1:)(ˆ{ pjXT j =

p 個の検定統計量 T(Xj) を 計算して,順番を以下のように付ける.

)(ˆ)(ˆ)(ˆ )()()1( pd XTXTXT ≥≥≥≥

を考察する.このとき,検定の多重性の問題 (FDR) やランキングの

正確さの問題は重要な問題である. しかし,現在,ターゲットにしている

問題は,表現形予測の問題であるので次のフィルターリングに使われる.

1 100 1000 30000d p

)(ˆ jXT

217

Page 218: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

30

Hierarchal clusteringclustering ⊆ unspervized learning

Hierarchal clustering

Optimal partition clustering k-means, self-organizing map

Minimum distance method

Eisen et al. (1998) PNAS

http://derisilab.ucsf.edu/data/microarray/software.html

218

Page 219: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

31

ArrayMaker Version 2

Gal File Maker v1.2

Cluster:

Tree View:

J-Express:

http://derisilab.ucsf.edu/data/microarray/software.html

219

Page 220: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

32

LASSO

),...,1(T niy iii =++= εαβx線形モデル

次元ベクトルとする.はここで pi , βx

typ

jj

n

iii =−−= ∑∑

== 11

2T ||subject to)(minarg)ˆ,ˆ( βαα βxβ

Lasso推定量 (Least absolute shrinkage and selection operator)

Tibshirani, 1996 JRSSB

220

Page 221: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

33

Sparse learning

)}({ 11

2T21 ||)( ty

pj j

n

iii −−−−

∂∂ ∑∑ =

=βλαβx

β

0=−−−= ∑∑==

)sgn()(11

T )( βxβxx λαn

iii

n

iii y

The Lagarange method for optomization with constraints

)ˆsgn(ˆˆ 1

1

TOLS )( βxxββ

=∑+=n

iiiλ

Lasso estimator and LSE have the following connection

).( that Assume1

T 単位行列In

iii =∑

=xx

),...,1()ˆsgn(ˆˆOLS pjβββ jjj =+= λ

OLSβ̂β̂

221

Page 222: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

34

Sparseness representation

),...,1()ˆsgn(ˆˆOLS pjβββ jjj =+= λ

+−= )|ˆ)(|ˆsgn(ˆOLSOLS λjjj βββ

Othonormal design matrix leads to

Inverting the above provides

jβ̂

jβOLSˆ

jβ̂

jβOLSˆ

λ

λ−

λ−

λ

)(T 単位行列Iii =∑ xx

⎩⎨⎧ >

=+

otherwise00if

whereAA

A

Note: λ is uniquely determined by the constraint tj =∑ |ˆ| β

222

Page 223: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

35

Control of sparseness

tj =∑ |ˆ| β

t

Control is reduced to a choice of t

223

Page 224: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

36

Microarray data

),( 1 pxx=x }1,1{ +−∈y

Leaning machines

)( xfy =

p signals Class labels

ABoost

)},(),...,,{( 11 nn yy xx

)(,),1( 11 nww

)(,),1( 22 nww1ε

∑=

T

1)( )(

ttt f xα

)(,),1( nww TT

)()2( xf

)()( xTf

)()1( xf

)}(...,),({ 1 xx Kff ∑=

T

1)( )(

ttt f xα

Difficult Problem: p >> n

224

Page 225: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

37

Data

)(,),1( 11 nww

)(,),1( 22 nww

),1()1,1(

),1()1,1(

,,)(,),(

G

Gffαα

xx

∑=

T

ttf

1)( )(x

)(,),1( nww TT

Grouping G machines )( ),(),()1,()1,(1

)( GtGtttGt fff αα ++=

)()1( xf

)()2( xf

)()( xTf

),2()1,2(

),2()1,2(

,,)(,),(

G

Gffαα

xx

),()1,(

),()1,(

,,)(,),(

GTT

GTT ffαα

xx

GroupBoost

GroupBoost )}(...,),({ 1 xx Kff ∑=

T

1)( )(

ttt f xα

225

Page 226: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

38

Results Data size

Test error

226

Page 227: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

39

Lung cancer analysisGene number Test error

true genes

false genes

227

Page 228: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

40

228

Page 229: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

41

MW (Time of Flight)

LensLaser beam source

Mirror

Ion detector

High voltage

+++

+++

+++Protein ChipFlight tube (High vacuum)

Proteome[SELDI TOF/MS]

229

Page 230: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

42

0

20

40

0

20

40

60

0

20

40

60

20

40

60

25

50

75

100

10

20

30

40

25

50

75

25

50

75

100

25

50

75

100

1500 1750 2000 2250 2500

Lungcancer

Normaltissue

Proteome data

230

Page 231: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

43

Averaged curve203 subjects (130 cases (ovarian cancer),73 controls)

231

Page 232: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

44

Common peaks

hth

SpecAlign, Wong et al. (2005)

232

Page 233: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

45

Peak pattern recognittion

),( 1 pxx=x }1,1{ +−∈y

Learning machines

)( xfy =

p peaks Class labels,

AdaBoost

例題

)(,),1( 11 nww

)(,),1( 22 nww1ε

∑=

T

1)( )(

ttt f xα

)(,),1( nww TT

)()2( xf

)()( xTf

)()1( xf

Test data(32cases+18contols)

233

Page 234: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

46

Association with drug effect

A joint work with Japan Institute Cancer Research Foundation

Breast cancer patients and a drug effect association

Supervised detection for common peaks

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15 20 25 30 35

training error

CV error

test error

(peaks)

error rate

234

Page 235: Tutorial Workshop on - 國立臺灣大學 · Pythagoras theorem 3. 4 Information divergence class and robust statistical methods I 4. 5 geometry learning statistics ... s s c r q

47

Inference for prediction

Microarray

SNPs

Proteome

Genome Function

Expression pattern(GroupBoost)

Peak pattern (Common peak)

SNP haploblock

Statistical machine Learning

235