information geometry...outline information geometry historical remarks treasure example information...

144
Information Geometry Shinto Eguchi The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected] Url: http//www.ism.ac.jp/~eguchi/ 2013年統計数理研究所夏期大学院 統計数理研究所 セミナー室 1 9261440分~1750ISM Summer School 2013

Upload: others

Post on 30-Jan-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Information Geometry

Shinto Eguchi

The Institute of Statistical Mathematics, Japan Email: [email protected], [email protected] Url: http//www.ism.ac.jp/~eguchi/

2013年統計数理研究所夏期大学院統計数理研究所 セミナー室 1 9月26日 14時40分~17時50分

ISM Summer School 2013

Page 2: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Outline

Information geometry

historical remarks

treasure example

Information divergence class

robust statistical methods

robust ICA

robust kernel PCA

2

Page 3: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

geometry

learning

statistics

quantum

1982 paperS. Amari

1975 paperB. Efron

P. Dawidcomment

Historical comment

1984Workshop

RSS150

C.R.Rao1945

R.A.Fisher1922

3

Page 4: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Dual Riemannian Geometry

Dual Riemannian geometry gives reformulation for Fisher’s foundation

Cf. Einstein field equations (1916)

Information Geometery aims to geometrizeA dualistic structure between modeling and estimation

Estimation is projection of data onto model.

“Estimation is an action by projection and model is an object to be projected”

The interplay of action and object is elucidated by a geometric structure.

Cf. Erlangen program (Klein, 1872)http://en.wikipedia.org/wiki/Erlangen_program

4

Page 5: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

2 x 2 table

The space of all 2×2 tables associates with a regular tetrahedron

We know the ruled surface is exponential geodesic, butdoes anyone know the minimality of the ruled surface?

The space of all independent 2×2 tables associates with a ruled surfaceIn the regular tetrahedron

5

Page 6: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

regular tetrahedron

P

Q

S

R

0 10 0

0 00 1 0 0

1 0

1 00 0

1/4 1/4

1/4 1/4

0 p0 1p

1p 0 p 0

6

Page 7: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

regular tetrahedron

P

Q

S

R

0 1/30 2/3

1/3 0 2/3 0

1/3 2/3 0 0

0 01/3 2/3

+

=

32

32

31

32

32

31

31

31

31

32

1/3 0 2/3 0

0 1/30 2/3

+31

32=1/3 2/3

0 0

0 01/3 2/3

7

Page 8: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

P

Q

S

R

P

Q

S

R

Ruled surface

8

Page 9: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

1 00 0

p q p (1-q)(1-p)q (1-p) (1-q)

0 00 1

0 10 0

hgfeyzxwn

22 )(

0)1()1(

})1)(1()1)(1({ 2

qqppqpqpqppqn

0 01 0

A B

C x y e

D z w f

g h n wzyxnwyhzxgwzfyxe

,,,

A B

C pq p(1q) P

D (1p)q (1p)(1q) 1p

q 1q 1

Page 10: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

e-geodesical

-10

0

10 -5

0

5

-5

0

5

-10

0

10

-5

0

5

-10

0

10-10

-5

0

5

10

-5

0

5

-10

0

10

}){( 10,10,1

log,1

log,)1)(1(

log

qpq

qp

pqp

qp }{ RI,RI),,,( yxyxyx

2112112222

21

22

12

22

11211211 1wherelog,log,log),,( )(

3S 3RI

10

Page 11: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Two parametrization

-10

0

10 -5

0

5

-5

0

5

-10

0

10

-5

0

5

)( )1(),1(, qpqpqp

)( ,qp

)(1

log,1

log,)1)(1(

logq

qp

pqp

qp

11

Page 12: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Kullback-Leibler Divergence

p

q

r

Let p(x) and q(x) be probability density functions.

Then Kullback-Leibler Divergence is defined by

12

Page 13: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Two one-parameter familiesLet be the space of all pdfs on a data space.

ここで

p

q

r

m-geodesic

e-geodesic

13

Page 14: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Pythagoras theorem Amari-Nagaoka (1982)

p

q

r

14

Page 15: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Proof

15

Page 16: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

ABC in differential geometry

Linear connection defines parallelism along a vector field.

Geodesic = a minimal arc between x and y

Riemannian metric defines an inner product on any tangent space of a manifold.

RI)()(:),( MMggM XX

1

0))(),((argmin

10})1(,)0(:)({

dttxtxgtyxxxtxc

)()()(: MMM XXX

))(,),(()()()2(

,)1(

MYXMfYfXYfYf

YfY

XX

XXf

XF

d

kk

kjijX XX

i1

ComponetwiseCf. slide 41

16

Page 17: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Geodesic

A one-parameter family (curve) is called geodesic }:)({ ttC θ

),...,1(0)()())(()(1 1

pitttt kkp

k

p

j

ikj

i

θ

with respect to a linear connection },,1:)({ pkjiikj θ

Remark 2:

If , any geodesic is a line.),,1(0)( pkjiikj θ

The property is not invariant with parametrization.

The gedesic C is expressed by a transform rule from parameter to

),...,1(0)()())((~)(1 1

patttt cbp

c

p

b

abc

a

ω

,)()(~11 1 1

p

ac

iba

i

p

k

p

j

p

i

ikj

ai

jb

kc

abc

BBBBB

θω )(for, θaaa

iiai

aai tBB

speed of acceleration

Newton's law of inertia

17

Page 18: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Change rule on connections

Let us consider a transform from parameter to . Then a geodesic C is

written by . Hence we have the following:

),...,1(0)()())((~)(1 1

patttt cbp

c

p

b

abc

a

ω

,)()(~11 1 1

p

ac

iba

i

p

k

p

j

p

i

ikj

ai

jb

kc

abc

BBBBB

θω )(for)( 1θaa

a

iiaB

}:))(()({ ttt θω

),...,1()())(()(1

pattBtp

i

iai

a

θ

),...,1()()())(()())(()(1 11

patttBttBtp

j

p

i

jij

ai

p

i

iai

a

θθ

i

aaiB

18

Page 19: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

What are reference books?

Foundations of Differential Geometry (Wiley Classics Library) Shoshichi Kobayashi, Katsumi Nomizu

Methods of Information Geometry Shun-Ichi Amari , Hiroshi NagaokaAmer Mathematical Society (2001)

Differential Geometrical Methods in StatisticsShun-Ichi Amari 1985 年 Springer

19

Page 20: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Statistical model and information

Statistical model }:),()({ θθxxθ ppM

Score vector ),(log),( θxθxsθ

p

Space of score vector }RI:),({ T psT αθαθ

Fisher information metric ),(}{),( θθθ TvuuvEvug

Fisher information matrix }),(),({ Tθxθxθθ ssEI

)RI( p

20

Page 21: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Score vector space

Note 1:

0),(),(

),(),(),(),()(T

T

xθxθxsα

xθxθxsαxθxθxθθ

dp

dpdpuuETu

βαβxθxθxsθxsα

xθxθxθx

θ

θθ

Idp

dpvuvugTvuTTT }),(),(),({

),(),(),(),(,

})),((,0)),((:),({ θxθxθx θθθ tVtEtS

The space of all random variables with mean 0 and finite variance

Note 2: is an infinite dimensional vector space that includesθS θT

),(),(For θxsθx Tu

,....2,1},{ ),(...

),(...

},),({),(11

kuEuuEuiki

k

iki

kkk θxθxθxθx θθ

belong to θS Cf. Tiatis (2006) 21

Page 22: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Linear connections

),,1()( }{ '1'

'e

pkjissEg ijk

p

i

iiikj

θθ

),,1()( }]{}{[ ''1'

'm

pkjisssEssEg ijkijk

p

i

iiikj

θθθ

e-connection

m-connection

m-geodesic

e-geodesic

Score vector piis 1)),((),( θxθxs

22

Page 23: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Semiparamtric thoery

Semiparametric Theory and Missing Data (Springer Series in Statistics) Anastasios Tsiatis

1 Introduction to Semiparametric Models

2 Hilbert Space for Random Vectors

3 The Geometry of Influence Functions

4 Semiparametric Models

5 Other Examples of Semiparametric Models

6 Models and Methods for Missing Data

7 Missing and Coarsening at Random for Semiparametric Models

8 The Nuisance Tangent Space and Its Orthogonal Complement

9 …14.

23

Page 24: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Geodesical models

Definition

))1,0(()()()(),( 1 tMqpcMqp ttt xxxx

(i) Statistical model M is said to be e-geodesical if

(ii) Statistical model M is said to be m-geodesic if

))1,0(()()()1()(),( tMqtptMqp xxxx

Note : Let P be a space of all probability density functions.

xxx dqpc ttt )()(/1 1where

By definition P is e-geodesical and m-geodesical.

However the theoretical framework for P is not perfectly complete.

Cf. Pistone and Sempi (1995, AS)24

Page 25: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Two type of modeling

})};()(exp{)(),({ 0)e( θθxtθxθx TppM

})}({)}({:)({0

)m( xtxtx pp EEpM

Exponential model

Matched mean model

})(:RI{ θθ p

Let p0(x) be a pdf and t (x) a p-dimensional statistic.

})};()()(logexp{)(),({

1 00

)e(

θθxxxθx

I

i

ii p

pppM

}:)()()1(),({1

01

)m(

I

iii

I

ii pppM xxθxMixture model

})(:RI{ θθ p

Let be a set of pdfs.},...,1,0:)({ Iipi x

)},...,1(0,10:),...,({1

1 Iii

I

iiI

θ

Exponential model

25

Page 26: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Statistical functional

A functional f (p) is called a statistical functional if p(x) is a pdf. (Hampel, 1974)

A statistical functional f (p) is said to be Fisher-consistent for a model

,)()1( pf

)()()2( θθf θp

if f (p) satisfies that}:)({ θxθpM

For a normal model }RIRI),(},2

)(exp{2

1)({ 22

2

2

θxθ

xpM

Tdxxxpdxxdxxxpp )))((,)(()( 22 f

is Fisher-consistent for = (, 2 ).

Example.

26

Page 27: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Transversality

Let f (p) be Fisher consistent functional.

which is called a leaf transverse to M.

,})(:{)( θffθ ppL

}{)()1( θθ f pML

)()2( fθL

is a local neighborhood.

)( fθL

)(* fθL

θp

*θp

M

27

Page 28: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Foliation structure

)( fP θL

))((),(),( fP θθθθLTvMTuTt ppp

),(),(),(such that θxθxθx vut

))(()()( fP θθθθLTMTT ppp

)( fθL

M

θp

),( θxt ),( θxu

),( θxv

Foliation

Decomposition of tanget spaces

Statistical model }:),()({ θθxxθ ppM )RI( p

28

Page 29: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Transversality for MLE

})};()(exp{)(),({ 0)e( θθxtθxθx TppMExponential model

Hence the foliation associated with fML is a matched mean model.

})(:RI{ θθ p

For the exponential model the MLE functional)e(M

)},({logmaxarg:)(ML θxfθ

pEp p

}{ )}({)}({arg)( ),(ML xtxtf θθ

pp EEp

is written by

})}({)}({:{)( ),(ML xtxtf θθ pp EEpL

Let p0(x) be a pdf and t (x) a p-dimensional statistic.

29

Page 30: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Maximum likelihood foliation

)( MLfθL

θp

)e(M

1p 2p

3p

2r

1r

30

Page 31: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Estimating function

),( θxu

Statistical model }:),()({ θθxxθ ppM )RI( p

p-variable function is unbiased

)(0)}),({(det,}),({ ),(),(

θ

θθxuθxu θθ pp EE 0

def

The statistical functional }),({solvearg)( 0

θxufθ

pEp

is Fisher-consitent, and the leaf transverse to M

}),(:{)( 0 θxufθ pEpL

is m-geodesical.)( fθL

31

Page 32: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Minimum divergence geometry

D : M×M → R is an information divergence on a statistical model M

(i) D(p,q) ≧ 0 with equality if and only if p = q

(ii) D is a differentialble on M×M

Then we get a Riemannian metric and dual connections on M (Eguchi, 1983,1992)

Let

32

Page 33: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

is a Riemann metric

Remarks

Cf. Slide 21 33

Page 34: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

U cross-entropy

U is a convex function,

U entropy

U divergence

U divergence

34

Page 35: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

35

Example of U divergence

KL divergence

Beta (power) divergence

Note

35

Page 36: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

36

Geometric formula with DU

36

Page 37: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

37

Triangle with DU

p

q

r

mixture geodesic

U geodesic

37

Page 38: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Information divergence class

and

robust statistical methods

38

Page 39: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Light and shadow of MLE

1.Invariance under data-transformations

2.Asymptotic efficiency

1. Non-robust

Sufficiency and efficiency

log exp

u

2. Over-fitting

Log-likelihood on exponential family

Likelihood method

U-method

39

Page 40: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Max U-entropy distribution

Equal mean space

Let us fix a statistics t (x).

Euler-Lagrange’s 1st variation

U-model

40

Page 41: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

U-estimate

U-loss function

U-estimate

U-estimating function

U-empirical loss function

Let p(x) be a data density function with statistical model

41

Page 42: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

-minimax

Equal mean space

Let us fix a statistics t (x).

42

Page 43: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

U-estimator for U-model

U-model

mean parameterU-estimator of

We observe that

which implies

Furthermore,

43

Page 44: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Influence function

Statistical functional

})),((()()),(({minarg)(

dxxpUxdGxpGTU

0

)(),(IF

GTxT

xFG )1(

)}],(),({),(),()[(),(IF 1 xSxwExSxwJxT UU

Influence function

||),(IF||sup)(GES xTT Ux

U 44

Page 45: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Efficiency

Asymptotic variance

))()()(,0()ˆ( 11 UUUD

U JHJNn

)),(),(()(

),),(),(),(()(

XSXwVarH

XSXSXwEJ

U

TU

111 )()()()( UUU JHJI

Information inequality

where

( equality holds iff U(x) = exp(x) )

45

Page 46: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Normal mean

-6 -4 -2 2 4 6

-6

-4

-2

2

4

6

0.125

0.1

0.075

0.05

0.01

0

-6 -4 -2 2 4 6

-6

-4

-2

2

4

6

2.5

0.8

0.4

0.15

0.015

0

β-power estimates η-sigmoid estimates

Influence functionβ= 0

= 0.015= 0.15= 0.4= 0.8= 2.5

η = 0= 0.01= 0.05= 0.075= 0.1= 0.125

tttU )exp()( 46

Page 47: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Gross Error Sensitivity

β-power estimates η-sigmoid estimates

β 効率 GES η 効率 GES0 1 ∞ 0 1 ∞

0.01 0.97 6.16 0.015 0.972 1.90.05 0.861 2.92 0.15 0.873 1.040.075 0.799 2.47 0.4 0.802 0.6780.1 0.742 2.21 0.8 0.753 0.455

0.125 0.689 2.04 2.5 0.694 0.197

47

Page 48: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Multivariate normal

pdf is

2

)()(exp)det)2((),,(1

21 yyyf

Tp

equation Estimating

),(

)(}))(({),(1

0)(),(1

μθ

μxμxθx

μxθx

UT

ii

ii

cwn

wn

48

Page 49: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

111 , , kkkkkk μθμθ

),(

),(1

ki

jkik xw

xxw

,

det ),(

)()(),(1

kUki

Tkikiki

k

cxw

xxxw

繰り返し重み付け平均と分散

Under a mild condition

),...1 ( )(L )(L 1 kkUkU

U-algorithm

49

Page 50: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Simulation

ε-conmination

)ˆ( 0KL θ,θD

, N , N-εG

, -

N .

. , N-εG

ε

ε

9999

00

1001

00

)1(

1004

55

250501

00

)1(

(2)

(1)

(2)

(1)

under 6516702under2439033

05.00

G . . G . .

εε

KL error

with MLE θ̂

50

Page 51: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

β-power estimates v.s. η-sigmoid estimates

0 39.240.01 35.300.05 20.930.10 8.910.20 12.400.30 31.64

0 39.240.0001 23.04

0.00025 6.700.0005 4.46

0.00075 4.640.001 6.04

βηKL error KL error

0 16.50.01 13.910.05 6.650.10 5.140.20 12.200.30 29.68

0 16.50.0005 3.36

0.00075 3.190.001 3.90.002 3.040.003 3.07

(1)G

)2(G

51

Page 52: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Plot of MLEs (no outliers)

1112

22

100 replications with100 size of sampleunder Normal dis.

η- Est (η=0.0025)β- Est (β=0.1)MLEtrue (1,0,1)

2212

1211

52

Page 53: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Plot of MLEs with ouliers

η- Est (η=0.0025)β- Est (β=0.1)MLEtrue (1,0,1)

)2(G

100 replications with100 size of sampleunder

11.5

2

2.5

0

0.5

1

1.5

0.5

1

1.5

2

11.5

2

2.5

0

0.5

1

1.5

53

Page 54: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Selection for tuning parameter

Squared loss function

dyygyf 221 )}()ˆ,({)ˆ(Loss

dyyfxfn

ii

n

i

2)(

1

)ˆ,(21)ˆ,(1)(CV

)(CVminargˆ

)ˆ,(IF1

1ˆˆ )( i

i xn

Approximate

54

Page 55: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

.26.1-

.1-.26 varianceand

00

mean with Normal

0.0250.050.0750.10.1250.150.175

2.82

2.84

2.86

2.88

2.9

2.92

2.94

.261.126-.126-.228

0.081-

0.054

signalMLE

.383.263-.263-1.059

0.184-

0.204

ContamiMLE

.286.134-.134-.293

0.132-

0.086

contamiEst-

β

)(C V

07.0ˆ

55

Page 56: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

What is ICA?

Cocktail party effect

Blind source separation

s1 s2 ……… sm

x1 x2 ……… xm

56

Page 57: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

ICA model

sWxW mmm 1 s.t., RR

(Independent signals)

(Linear mixture of signals)

0)(,,0)()()()(),...,(

1

111

m

mmm

SESEspspspsss

),,( 1 nxx

))(())((|)det(|),,( 111 mmmppWpWf μxwμxwx

Aim is to learn W from a dataset

in which unknown. is)()()( 11 mm spspp s

57

Page 58: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

ICA likelihood

Log-likelihood function

|)det(|log))((log)(11

WpW jijj

m

j

n

i

μxw

TTm WWxWxhI

WpWxpWxF

))()((),,(),,(

)( )(log,,)(log)(1

11

m

mm

ssp

sspsh

Estimating equation

Natural gradient algorithm, Amari et al. (1996)58

Page 59: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Beta-ICA

power equation ),(),,(),,(11

WBWxFWxf

n

n

iii

decomposability:

0]))}({E[

)]()}({E[

])}({E[,

XwXwp

XwhXwp

Xwp

tttt

ssssss

tqsqqq

ts

59

Page 60: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Likelihood ICA

0.5 1 1.5 2 2.5

0.2

0.4

0.6

0.8

1

1.2

1.4

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

5.01211W

150 signals U(0,1)× U(0,1)

Mixture matrix

60

Page 61: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Non-robustness

-2 -1 1 2 3

-2

-1

1

2

-2 -1 1

-2

-1

1

2

50 Gaussian noise N(0, 1)× N(0, 1)

61

Page 62: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

power-ICA (β=0.2)

-2 -1 1 2 3

-2

-1

1

2

-10 -7.5 -5 -2.5 2.5 5 7.5

-6

-4

-2

2

4

Minimum -power

62

Page 63: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Tsallisべきエントロピーに射影不変性を課したエントロピ-、ダイバージェンスを考察した。

平均と分散を一定にする分布族で射影べきエントロピー最大分布モデルが正規分布,

t-分布,ウエグナー分布を含む形で導出された.

そのモデル上での平均と分散の素直な推定法を提案し,推定量が標本平均と標本分散

になることが示された.

63

Page 64: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

-cross entropy

-diagonal entropy

Cf. Tsallis (1988), Basu et al (1998), Fujisawa-Eguchi (2008), Eguchi-Kato (2010)

-divergence

Remark= 0 reduces to Boltzmann-Shannon entropy and Kullback-Leibler divergence

Projective -power divergence

64

Page 65: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Max -entropy

-entropy

Equal moment space

Max -entropy

-model

65

Page 66: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

-estimation

Parametric model

-Loss function

-estimator

Gaussian

-estimator

Remark Cf. Fujisawa-Eguchi (2008)66

Page 67: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

-estimation on -model

-loss function

Theorem.

67

Page 68: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

model

loss function

-loss

normal-model -model

(-estimator ’-model)

0-loss

( log-likelihood) MLE

Moment estimatorRobust -estimator

Robust model

MLE

68

Page 69: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Mixture of independent signals

Whitening

Preprocessed form

Statistical model

-ICA

Gamma-ICA

69

Page 70: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

70

Page 71: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

71

Page 72: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Boosting leaning algorithms

for supervised/unsupervized learning

U-loss functions

Outline

72

Page 73: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Which chameleon wins?

73

Page 74: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Pattern recognition…

74

Page 75: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

What is pattern recognition?

□ There are a lot of examples for pattern recognition.

□ In principle pattern recognition is a prediction problem for class label

which would classify a human interestingness and importance.

□ Originally human brain wants to label phenomena a few words , for

example (good, bad), (yes, no), (dead, alive), (success, failure), (effective,

no effect)….

□ Brain intrinsically predicts the class label from empirical evidence.

75

Page 76: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Practice

● Character recognitionvoice recognition

Image recognition

☆ Credit scoring

☆ Medical screening

☆ Default prediction

☆ Weather forcast

● face recognition

finger print recognition●

speaker recognition●

☆ Treatment effect

☆ Failure prediction ☆ Infectious desease

☆ Drug response

76

Page 77: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Binary classification

classifier

Feature vector Class label

score

In a binary class y = 1, 1 (G=2)

77

Page 78: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Probability distribution

).,( vector random of pdf a be),(Let yyp xx

B Cy

ypCBp }d),({),( xx

Xxx d),()( ypypMarginal density

Conditional density

},...,1{

),()(Gy

ypp xx

)(),()|(

xxx

pypyp

)(),()|(

ypypyp xx

)()|()()|(),( xxxx pypypypyp

)|1()|1(

)1,()1,(

xx

xx

ypyp

ypyp

78

Page 79: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Error rate

Classifier

Feature vector ,class label

has error rate

))(Pr()(Err yhh FF x

G

iF

jiFF iyihjyihh

1

),)(Pr(1),)(Pr()(Err xx

Training error

Test error

79

Page 80: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

False negative/positive

FNR

True Negative

True Positive False Positive

False Negative

FPR

80

Page 81: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Bayes rule

Let p(y|x) be a conditional probability of y given x.

The classifier leads to Bayes rule. ))(sgn()( 0Bayes xx Fh

For any classifier h

)(Err)(Err Bayes hh

Note: The optimal classifier is equivalent to the likelihood ratio.However, in practice p(y|x) is unknown, so we have tolearn hBayes(x) based on the training data set.

Theorem 1

81

Page 82: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Discriminant space associated with Bayes classifier is given by

Error rate for Bayes rule

In general, when a classifier h associates with spaces

)(Err)(Err Bayes hh Cf. Mclachlan, 2002

82

Page 83: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Boost learning

Boost by filter (Schapire, 1990)

Bagging, Arching (bootstrap)(Breiman, Friedman, Hasite)

AdaBoost (Schapire, Freund, Batrlett, Lee)

Weak learners can be combined into a strong leanrner?

Weak learner (classifier) = error rate is slightly less than .5

Strong learner (classifier) = error rate is slightly less than that of Bayes classifier

83

Page 84: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Set of weak learners

Decision stumps

Linear classifiers

Neural net SVM k-nearest neighbur

Note: not strong but a variety of characters

84

Page 85: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Exponential loss function

Empirical exponential loss function for a score function F(x)

)}(exp{1)(1

exp ii

n

i

D Fyn

FL x

where q(y|x) is the conditional distribution given x, q(x) is the pdf of x.

Let be a training (example) set.

xxxxX

d)()}|()}(exp{{)(}1,1{

expqyqyFFL

y

E

Expected exponential loss function for a score function F(x)

85

Page 86: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

0)(),1()(:Intial.1 011

xFniiw n    

,)'(

)())((I)(

iwiwfyf

t

tiit x

)(

)(121

)(

)(log)b(tt

ttt

f

f

T

tttTT fFF

1)( )()( where,)(sign.3 )( xxx

Tt ,,1For .2

))(exp()()()c( )(1 iitttt yfiwiw x

)(min)()a( )( ff tftt F

Adaboost

86

Page 87: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Learning algorithm

)},( ),...,,{( 11 nn yy xx

)(,),1( 11 nww

)(,),1( 22 nww

1

2

1

2

T

)()1( xf

T

1)( )(

ttt f x

)(,),1( nww TT

)()2( xf

)()( xTf

Final learner

T

tttT fF

1)( )()( xx

87

Page 88: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Update weight

21)( )(1 tt f The worst case

)()( 1 iwiw tt

       

      

t

t

eyf

eyf

iit

iit

Multiply )(

Multiply )(

)(

)(

x

x

update

Weighted error rate )()()( )1(1)(1)( tttttt fff

88

Page 89: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

21)( )(1 tt f

)'()())(()(

1

1

1)(1 iw

iwyfIft

tn

iiittt x

n

ititit

n

itititiit

iwfy

iwfyyfI

1)(

1)()(

)()}(exp{

)()}(exp{))((

x

xx

n

itiitt

n

itiitt

n

itiitt

iwyfIiwyfI

iwyfI

1)(

1)(

1)(

)())((}exp{)())((}exp{

)())((}exp{

xx

x

21

)}(1{)(1

)()(

)()(1

)()(

)(1

)()(

)()(

)(

)(

)()(

)(

tttt

tttt

tt

tt

tttt

tt

ff

ff

ff

ff

f

89

Page 90: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Update in exponential loss

)}(exp{1)(1

exp ii

n

i

Fyn

FL x

)()()( xxx fFF

}))(1()(){(exp fefeFL

Consider

n

iiiiiii yfeyfeFy

n 1

))(I())(I()}(exp{1 xxx

n

iiiii fyFy

nfFL

1exp )}(exp{)}(exp{1)( xx

)(

)}(exp{))(()(

exp

1

FL

FyyfIf

n

iiiij

xxwhere

90

Page 91: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Sequential optimization

}))(1()(){()( expexp efefFLfFL

)()(1log

21

opt ff

)}(1){(2 ff

)}(1){(2)()(12

ffefe

f

Equality holds if and only if

efef ))(1()(

91

Page 92: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

AdaBooost = Seq-min exponential loss

)(minarg)a( )( ff tFf

t

)}(exp{)()((c) 1 itittt xfyiwiw

)}(1){()()(min )()(1exp)(1exp ttttt ffFLfFL

R

)()(1

log21

)(

)(opt

t

t

ff

)(minarg)b( )(1exp ttt fFL

R

92

Page 93: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Simulation (complete separable)

-1 -0.5 0.5 1

-1

-0.5

0.5

1

[-1,1]×[-1,1]

Feature space

Decision boundary

93

Page 94: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

-1 -0.5 0.5 1

-1

-0.5

0.5

1

Set of linear classifiers

Linear classification machines

Random generation

94

Page 95: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Learning process

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10

Iter = 23, train err = 0.10 Iter = 31, train err = 0.095 Iter = 47, train err = 0.0895

Page 96: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Final decision boundary

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Contour of F(x) Sign(F(x)) 96

Page 97: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Learning curve

50 100 150 200 250

0.05

0.1

0.15

0.2

Iteration number

Training curve

97

Page 98: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

KL divergence

For conditinal distribution p(y|x), q(y|x) given x with commonmarginal density p(x) we can write

Nonnegative function

Feature space

Note

Label set

Then,

98

Page 99: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Twin of KL loss functionFor data distribution q(x, y) = q(x) q(y|x) we model as

G

gyFgFym

11 )},(),(exp{)|( xxx

xxxxxxx

Xd)()}|()|(

)|()|(log)|({),(

}1,1{KL qymyq

ymyqyqmqD

y

Then

G

g

gF

yFym

1

2

)},(exp{

)},(exp{)|(x

xx

Log lossExp loss 99

Page 100: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Bound for exponential loss

Empirical exp loss

Expected exp loss

Theorem.

100

Page 101: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Variational calculas

101

Page 102: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

On AdaBoostFDA, or Logistic regression

Parametric approach to Bayes classifier

AdaBoost

The stopping time T can be selected according to the state of learning.

Parametric approach to Bayes classifier, but dimension and basis function are flexible

102

Page 103: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

U-Boost

U-empirical loss function

In a context of classification

Unnormalized U-loss

Normalized U-loss

103

Page 104: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

U-Boost (binary)

Unnormalized U-loss

Note

Bayes risk consistency

F*(x) is Bayes risk consistent because104

Page 105: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Eta-loss function

regularized

AdaBoost with margin

105

Page 106: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

EtaBoost for Mislabels Expected Eta-loss function

Optimal score

The variational argument leads to

Mislabel modeling

106

Page 107: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

EtaBoost

0)(),1()(: settings Initial.1 011

xFniiw n    

,)())((I)(1

iwfyf m

n

iiim

x

T

tttTT fFF

1)( )()( where,)(sign.3 )( xxx

Tm ,,1For .2

))(exp()()()c( )(*

1 iimmmm yfiwiw x

)(min)()a( )( ff mfmm

(b)

107

Page 108: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

A toy example

108

Page 109: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Examples partly mislabeled

109

Page 110: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

AdaBoost vs. EtaBoost

AdaBoost EtaBoost110

Page 111: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

ROC curve, AUC

cF 閾値:判別関数 ),(: X

)1Y|)(()TPR(

0)Y| )(()FPR(

cFPc

cFPc

X

X

}RI|))(TPR),(FPR{()ROC( cccF

111

0)(Y )(1)(Y )(

陰性

陽性

cFcF

XX

1} {0, ,RI , ),( YY pXX

)'(FPR)'(TPR),AUC( cdcF

TPR

FPR

1

10

c

AUC

偽陽性確率

真の陽性確率

受信者動作特性(ROC) 曲線

下側面積(AUC )

Cf. Bamber (1975), Pepe et al. (2003)

Page 112: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

112

Page 113: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

113

Page 114: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

114

Page 115: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

115

Page 116: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

116

Page 117: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

117

Page 118: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

118

λ5-fold CVの結果(AUCBoost)

118

Page 119: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

119

Page 120: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

120

Page 121: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

121

Page 122: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

2標本問題とAUCブーストの関係

AUCブーストの学習アルゴリズムは逐次的に各 t-ステップで

得られた判別関数 Ft(x) のウィルコクソン検定統計量

を(t+1)-ステップ

で最大方向に増大するよう設計されている.

このように

となる中で全ての単点解析が統合されている.

この意味で全ての単点解析はブースティングに

組むことは可能である. Eguchi-Komori (2009)

))()((1)(ˆ0

1 11

10

1 0

it

n

k

n

iktt FFI

nnFC xx

)()()( 111 xxx tttt fFF

)(ˆ)(ˆ1tt FCFC

122

Page 123: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

123

Page 124: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

124

Page 125: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Partial AUC

))( ),X()X((P

)(FPR)(TPRpAUC

010 XFcFF

cdcc

)(xF

健常者 病人

cTPR

FPR

1

10

)(TPR c

)(FPR c )(xF cTPR

FPR

1

10

pAUC

)(FPR c

)(TPR c

健常者 病人

cc

pAUC

125

Page 126: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

126

Page 127: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Robust for noninformative genes

127

Page 128: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Robustness for noninformative genes

128

Page 129: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

生成関数

真のU 陽性確率

129

Page 130: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Boost learning algorithm

130

Page 131: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Bayes risk consistency

131

Page 132: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

ベンチマークデータ

132

Page 133: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

T r u e d e n s i t y

- 4

- 2

0

2

4

- 4

- 2

0

2

4

0

0 . 0 0 5

0 . 0 1

0 . 0 1 5

0 . 0 2

- 4

- 2

0

2

4

W dictionary133

Page 134: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

U-Boost learning

)( argmin *

* fLf Uf UW

pfUf

nfL

n

iiU

RI1d)))((())((1)( xxx U-loss function

)( ))((co 1* WW U

Dictionary of density functions

},1d)(,0)(:)({ xxxx gggW

Learning space = U-model

)}( ))(({ 1

xg

Goal : find

)())0,1,,0(,(Then.))((),(Let )(

1 )( xxxπx

gfgf

WW * U

134

Page 135: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Inner step in the convex hull

))((co1* WW U}:),({ xW

*UW

W

)(xf

1g

2g3g

4g

5g

6g

7g

)(xf

)( argmin *

* fLf Uf UW

Goal : )))(ˆ(ˆ))(ˆ(ˆ()( ˆˆ111* xxx kk fff

135

Page 136: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Convergence of Algorithm

)()()(

)()()1()(

*11

111

fgg

gff

KK

KKKKK

*UW

-mixture

)()()()()( *111 fffff kkk

),()()( 11 kkUkUkU ffDfLfL

k

jjjUkUU ffDfLfL

1111 ),()()(

136

Page 137: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Simplified boosting

initial with the

))),()()1(((minarg

such that))()()1((

,1,....,0For

1

11

1111

1

ckc

gfLg

gfff

Kk

k

kUk

kkkkkk

g

W

K

kkkK gf

1

1 )( )(ˆ

)(minarg1 gLf Ug W

137

Page 138: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Then we have

Non-asymptotic bound

UbU 2

))(co(),,()}()(){(sup)A(

WWW

),(inf),(FA fgDg Uf

UU=W

=W

|}| )(EI))((supEI2),(EE1

1{ ffg pi

n

inf

g

xW

W

constant)length-step:(1

),(IE22

ccK

bcK U

W

(Functional approximation)

(Estimation error)

(Iteration effect)

),(EEand),(FAbetween Trade Remark. WW pp *U

where

),,IE( ),EE(),FA()ˆ,(EI * WWW KggfgD UKUg

Theorem. Assume that a data distribution has a density g(x) and that

138

Page 139: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Lemmas

Lemma 1 Then).))(((andestimator density abe~Let 11* N

j jjpff x

).,,~()}()~(){1()())))(())(~()1((( ***1

1 fffLfLfLfLp UUUUNj jUj

xx

Lemma 2 Then.))(co(in be~Let Wf

])1,0[(),,( 22* UU bff

Lemma 3 (A)assumptionUnder

))((where1

)(inf)ˆ( 122

jjU

UKU fcK

bcfLfL

Lemma 4 then,)(inf)ˆ( If

fLfL UKU

||)(EI))((1||supEI2),(inf)ˆ,(EI

ffn

fgDfgD gif

gUKUg xW

139

Page 140: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

KDERSDE

-Boost -Boost

C (skewed-unimodal) L (quadrimodal) H (bimodal)

140

Page 141: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

geometry

learning

statistics

quantum

Future of information geometry

Deepening

Optimal transform

Free probability

Semiparametrics

Compressed sensing

Poincare conjecture

Dynamical systems

Mathematical evolutionary theory

Dempster-Shafer theory141

Page 142: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Poincaré conjecture 1904

Every simply connected, closed 3-manifold is

homeomorphic to the 3-sphere

Ricci flow by Richard Hamilton in 1981 Perelmann 2002, 2003

Optimal control theory due to Pontryagin and Bellman

Optimal transport

Wasserstein space }~,~:||||E{min),( 2W gYfXYXgfd

142

Page 143: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Optimal transport

Theorem (Brenier, 1991)

Talagrand inequality

Log Sovolev inequality

Optimal transport theory is extending on a general manifold

Geometers consider not a space, but a distribution family on the space.

C. Villani (2010) Optimal transport theory as Fields medal

143

Page 144: Information Geometry...Outline Information geometry historical remarks treasure example Information divergence class robust statistical methods robust ICA robust kernel PCA 2 geometry

Thank you

144