2004 年 11 月 24 日（水）～ 26 日（金）

1

2004 年 11 月 24 日（水）～ 26 日（金）

　

2004 Open Lecture at ISM Recent topics in machine learning: Boosting

公開講座　統計数理要論「機械学習の最近の話題」

ブースト学習

江口　真透

（統計数理研究所 , 総合研究大学院統計科学）

2

講座内容

ブースト学習 :

統計的パタン認識の手法であるアダブーストを概説し、その長所と欠点について考察します．遺伝子発現、リモートセンシング・データなどの適用例の紹介をします。

3

Boost Leaning (I)

10:00-12:30 November 25 Thu

　　　　　

Boost learning algorithm

AdaBoost

EtaBoost

GroupBoost

AsymAdaBoost AsymLearning

Robust learning

Group learning

4

Boost Leaning (II)

13:30-16:00 November 26 Fri

　　　　　

Statistical discussion

Optimal classifier by AdaBoost

Probablistic framework

Bayes Rule, Fisher’s LDF, Logistic regression

BridgeBoost

LocalBoost

Meta learning

Local learning

5

謝辞

この講座で紹介された内容の多くは，以下の共同研究者の方々との成果を含む．ここに感謝

する．村田昇氏（早稲田大学理工学）西井龍映氏（九州大学数理学）金森敬文氏（東京工大，情報数理）竹之内高志氏（統計数理研究所）川喜田正則君（総研大，統計科学） John B. Copas (Dept Stats, Univ of Warwick)

6

The strength of the weak learnability.

　

Strong learnability If, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances.

Weak learnability The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.

Schapire, R. (1990)

7

Web-page on Boost

Boosting Research Site: http: //www.boosting.org/

Robert Schapire’s home page:http://www.cs.princeton.edu/~schapire/

Yoav Freund's home page :http://www1.cs.columbia.edu/~freund/

John Laffertyhttp://www-2.cs.cmu.edu/~lafferty/

8

Character, Image, Speaker, Signal, Face, Language,…

　　　 Recognition for

Statistical pattern recognition

　　　 Prediction for

　　　 Weather, earthquake, disaster, finance, interest rates, company

bankruptcy, credit, default, infection, disease, adverse effect

　　 Classification for

Species, parentage, genomic type, gene expression,

protein expression, system failure, machine trouble

9

Multi-class classification

Feature vector Class label

Discriminant function

Classification rule

10

Binary classification

Classification rule

Learn a training dataset

Make a classification

label

0-normalization

11

Statistical learning theory

Boost learning

Boost by filter (Schapire, 1990)

Bagging, Arching （ bootstrap ） (Breiman, Friedman, Hasite)

AdaBoost (Schapire, Freund, Batrlett, Lee)

Support vector

Maximize margin Kernel space

(Vapnik, Sholkopf)

12

Class of weak machines

Stamp class

Linear class

ANN class SVM class kNN class

Point: colorful character rather than universal character

13

AdaBoost

0)(),1()(: settings Initial.1 01

1 xFniiw

n　　　

,)'(

)())((I)(

iw

iwfyf

t

tiit x

)(

)(1

21

)(

)(log)b(tt

ttt

f

f

T

tttTT fFF

1)( )()( where,)(sign.3 )( 　xxx

Tt ,,1For .2

))(exp()()()c( )(1 iitttt yfiwiw x

)(min)()a( )( ff tf

tt F

14

Learning algorithm

Data

)(,),1( 11 nww

)(,),1( 22 nww 1

2

1

2

T

)()1( xf

T

1)( )(

ttt f x

)(,),1( nww TT

)()2( xf

)()( xTf

Final machine

T

tttTT fFF

1)( )()(where,)(sign )( 　xxx

15

Simulation (complete separation)

-1 -0.5 0.5 1

-1

-0.5

0.5

1

[-1,1]×[-1,1]

Feature space

Decision boundary

16

-1 -0.5 0.5 1

-1

-0.5

0.5

1

Set of weak machines

Linear classification machines

Random generation

17

Learning process (I)

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10

Iter = 23, train err = 0.10 Iter = 31, train err = 0.095 Iter = 47, train err = 0.08

18

Learning process (II)

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Iter = 55, train err = 0.061

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1


-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1


19

Final stage

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Contour of F(x) Sign(F(x))

20

Learning curve

50 100 150 200 250

0.05

0.1

0.15

0.2

Iter = 1,…..,277

Train

ing

error

21

Characteristics

2

1)( )(1 tt f （ least favorable ）

)()( 1 iwiw tt

　　　　　　　　　

　　　　　　

t

t

eyf

eyf

iit

iit

ctormultiplifa)(

ctor multiplifa)(

)(

)(

x

x

Update

Weighted error rates )()()( )1(1)(1)( tttttt fff

22

2

1)( )(1 tt f

)'(

)())(()(

1

1

1)(1 iw

iwyfIf

t

tn

iiittt x

n

ititit

n

itititiit

iwfy

iwfyyfI

1)(

1)()(

)()}(exp{

)()}(exp{))((

x

xx

n

itiitt

n

itiitt

n

itiitt

iwyfIiwyfI

iwyfI

1)(

1)(

1)(

)())((}exp{)())((}exp{

)())((}exp{

xx

x

2

1

)}(1{)(1

)()(

)(

)(1

)()(

)(1

)()(

)()(

)(

)(

)()(

)(

tttt

tttt

tt

tt

tttt

tt

ff

ff

f

f

ff

f

23

Exponential loss

)}(exp{1

)(1

exp ii

n

i

Fyn

FL x

)()()( xxx fFF

}))(1()(){(exp fefeFL

Update by

Exponential loss ：

n

iiiiiii yfeyfeFy

n 1

))(I())(I()}(exp{1

xxx

n

iiiii fyFy

nfFL

1exp )}(exp{)}(exp{

1)( xx

24

Sequential minimization

　 }))(1()(){()( expexp efefFLfFL

)(

)(1log

2

1opt f

f

)(

)}(exp{))(()(

exp

1

FL

FyyfIf

n

iiiij

xxwhere

)}(1){(2 ff

)}(1){(2)()(1

2

ffefe

f

Equality holds iff

efef ))(1()(

25

AdaBoost 　＝　 minimum exp-loss

　

)(minarg)a( )( ff tFf

t

)(

)(1log

2

1 (b)

)(

)(

tt

ttt f

f

)}(exp{)()((c) *1 itittt xfyiwiw

)}(1){()()(min )()(1exp)(1exp ttttt ffFLfFL

R

)(

)(1log

2

1

)(

)(opt

t

t

f

f

26

Simulation (complete random)

-1 -0.5 0.5 1

-1

-0.5

0.5

1

27

Overlearning of AdaBoost

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1


-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter = 151, train err = 0.06 Iter =301, train err = 0.0

28

Drawbacks of AdaBoost

1. Unbalancing learning

2. Over-learning even for noisy dataset

EtaBoost

GroupBoost

BridgeBoost

AsymAdaBoost

LocalBoost

Robustfy mislabelled examples

Relax the p >> n problems

Extract spatial information

Combine different datasets

Balancing the false n/ps’

29

AsymBoost

)(minarg 1**

ttasymt FfL

R

The small modification of AdaBoost into

2 (b)’

The selection of k

The default choice is

n

ii

n

ii

yI

yIk

1

1

)1(

)1(

30

Weighted errors by k

34

Result of AsymBoost

35

Eta-loss function

regularized

36

EtaBoost


1 xFniiw

n　　　

,)())((I)(1

iwfyf m

n

iiim

x

T

tttTT fFF

1)( )()( where,)(sign.3 )( 　xxx

Tm ,,1For .2

))(exp()()()c( )(*

1 iimmmm yfiwiw x

)(min)()a( )( ff mf

mm

(b)

37

A toy example

38

AdaBoost vs Eta-Boost

39

Simulation (complete random)

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter = 51, train err =

0.21

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter =301, train err =

0.0Overlearning of AdaBoost

40

EtaBoost

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Iter = 51, train err = 0.25 Iter = 51, train err = 0.15 Iter =351, train err = 0.18

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

41

Mis-labeled examples

Mis-labeled

42

Comparison

AdaBoost EtaBoost

43

GroupBoost

Relax over-learning of AdaBoost by group learning

Idea: In AdaBoost 2 (a)

The best machine is singly selected

Other better machines are cast off.

Is there any wise way of grouping G best macines?

44

　　 Grouping machines

where),(minarg);(Let fbf tjjtjFf

G

gggtgtt bxf

Gf

1)()()(, );(

1)( x

}1,1{,RI:)sgn( abbxaF jj

);(,),;(setaSelect )()()1()1( ggtt bfbf

));(());((thatsuch )()()1()1( ggtttt bfbf

45

GroupBoost

)())((I)()a(1

iwyff t

N

iiit

x

11

))(exp()()((c)

t

ittt Z

yfiwiw

x

);(,),;( )()()1()1( GGtt bxfbxf

).,,1());((

));((1log

2

1)b(

)(

)(),( Gg

bf

bf

gtt

gttgt

.);(1

)(1

)()(),(

G

gggtgt bxf

Gf x

46

0F

11 f

22 f

33 f

44 f

1f

2f

3f

4f

Grouping jumps for the next

47

Learning archtecture

Data

)(,),1( 11 nww

)(,),1( 22 nww

1

2

T

),1()1,1(

),1()1,1(

,,

)(,),(

G

Gff

xx

T

ttf

1)( )(x

)(,),1( nww TT

Grouping G machines )( ),(),()1,()1,(1

)( GtGtttGt fff

)()1( xf

)()2( xf

)()( xTf

),2()1,2(

),2()1,2(

,,

)(,),(

G

Gff

xx

),()1,(

),()1,(

,,

)(,),(

GTT

GTT ff

xx

48

AdaBoost and GroupBoost

ostGroupAdaBo・

AdaBoost・ )()( exp11exp tttt FLfFL

)()()( exp1exp11exp tttttt FLfFLfFL

)exp(:AdaBoost t

)exp(:ostGroupAdaBo1

,

G

ggt

Update the weights

49

From microarray

Contest program from bioinformatics

(BIP2003)

http://contest.genome.ad.jp/

Microarray data

Number of genes p = 1000 ～ 100000

Size of individuals n = 10 ～ 100

50

Output

http://genome-www.stanford.edu/cellcycle/

51

Goal

Disease and gene expressions

y is a label for clinical infomation

x = (x , …, x ) = feature vector1 p

i x = log

The genen expression for i-th individual is observed

n = 27+11 = 38 << p = 7109(BIP2003) Problem2 ：

52

Microarray data

cDNA microarry

53

Prediction from gene expressions

),( 1 pxx x

}1,1{ y

yf x:

Feature vector dimension = number of genes p components = quantities of gene expression

Class label disease, adverse effect

Classification machine

based on training dataset }1:),({ niyii x

54

Leukemic diseases, Golub et alhttp://www.broad.mit.edu/cgi-bin/cancer/publications/

55

AdaBoost


1 xFniiw

n　　　

,)())((I)(1

iwfyf t

n

iiit

x

)(

)(1

21

)(

)(log)b(tt

ttt

f

f

T

tttTT fFF

1)( )()( where,)(sign.3 )( 　xxx

Tt ,,1For .2

))(exp()()()c( )(1 iitttt yfiwiw x

)(min)()a( )( ff tf

tt

56

One-gene classifier

Error number 5 5 5 6 466 5 565

})(sgn({minarg i

ijij bxyIbb

one-gene classifier

jj

jjj bx

bxf

if1

if1)(x

jb

Let be expressions of the j-th genenjj xx ,...,1

jx

57

The second training

Errror number 45.5 7 9

87.56

798.5

Update the weight:

jx

4.5

jb

Weight up to 2

})(sgn()({minarg i

ijij bxyIiwbb

2log4

16log5.0

ans. false of nb.

ans.correct of nb.log5.01

Weight down to 0.5

jb

58

Web microarray data

p n y = +1 y = － 1

ALLAML 7129 72 37 35

Colon 2000 62 40 22

Estrogen 7129 49 25 24

p >> n

http://microarray.princeton.edu/oncology/http://mgm.duke.edu/genome/dna micro/work/

59

10-fold validation

N

iii yD 1),( x

1 2 3 4 109

Validation Training

)1(

Validation Training

)2(

Validation Training

)3(

)(10

1byAveraing

10

1

kk

60

ALLAML

Training error 10-fold CV error

61

Colon


62

Estrogen


63

Gene score

T

t

G

gggtgt

T

tT bxffF

1 1)()()(,

1

);()()( xx・

gene th-forfactorConfidence j

T

t

G

ggtjgI

TjS

1 1),())((

1)(

S(j) suggests the degree of association with the class-label

64

Genes associated with disease

Compare our result with the set of genes suggested

by Golub et al. ， West et al.

AdaBoost can detect only 15 genes in

the case of ALLAML,.

GroupAdaBoostde detect 30 genes compatible with

Their results

65

Boost Leaning (II)

13:30-16:00 November 26 Fri

　　　　　


Optimal classifier by AdaBoost

Probablistic framework

Bayes Rule, Fisher’s LDF, Logistic regression

BridgeBoost p >> n problem

LocalBoost contextual information

66

Problem: p >> n

Fundamental issue on Bioinformatics

p is the dimension of biomarker

(SNPs, proteome, microarray, …)

n is the number of individuals

(informed consent, institutional protocol, …bioethics)

67

An approach by combining

Let B be a biomarker space

Rapid expansion of genomic data

},,1:)({ kik niD z

pnKK

kk

1

larger

Let be K experimental facilitiesKII ,,1

),...,1( Kknp k

68

BridgeBoost

1D

CAMDA (Critical Assessment of Microarray Data Analysis )

DDBJ (DNA Data Bank Japan, NIG)

2D

KD

)( 1Df

)( 2Df

)( KDf

…. ….

)|( 11 DDf

)|( 22 DDf

)|( KK DDf

…. result

69

CAMDA 2003

4 datasets for Lung Cancer

Harvard PNAS, 2001 Affymetrix

Michigan Nature Med,

2002

Affymetrix

Stanford PNAS, 2001 cDNA

Ontario Cancer Res

2001

cDNA

http://www.camda.duke.edu/camda03/datasets/

70

Some problems

1. Heterogeneity in feature space

cDNA, Affymetrix

Differences in covariates ， medical diagnosis

Uncertainty for microarray experiments

2. Heterogeneous class-labeling

3. Heterogeneous generalization powers

A vast of unpublished studies

4. Publication bias

71

Machine learning

Leanability: boosting weak learners?

AdaBoost : Freund & Schapire (1997)

weak classifiers

})(,....),({ 1 xx pff

A strong classifier

)()( )()1(1 xx tt ff

)(xf

)()1(1 xfstagewise

72

Learning algorithm

D

)(,),1( 11 nww

)(,),1( 22 nww 1

2

1

2

T

)()1( xf

T

1)( )(

ttt f x

)(,),1( nww TT

)()2( xf

)()( xTf

Final machine

T

tttTT fFF

1)( )()(where,)(sign )( 　xxx

73

Different datasets

}1:),({where )()(k

ki

kik niyD x

K

kkDD

1

Normalization: ]1,0[ RI )()( pki

ki

p xx

),...,1(minmax

min)()(

)()(

)( pjxx

xxx

kji

i

kji

i

kji

i

kjik

ji

∋

)(

)( RIk

i

pki

y

x expression vector of the same genes

label of the same clinical item

74

Weighted Errors

))((I)(1

)()(

1

)(

)()(

)(

)(

k

k

n

i

ki

kin

i

k

t

k

tkt fyf

iw

iwx

K

k

kt

kt ff

1

)()( )()(

The k-th weighted error

The combined weighted error

K

h

n

i

h

t

n

i

k

tk

h

k

iw

iw

1 1

)(

1

)(

)(

)(

)(

where

75

)(

)(121

)(

)(

log)b( )(k

k

tt

ttkt

f

f

))(exp()()()d( )()()()()(1

ki

kit

kt

kt

kt yfiwiw x

)(minarg(a) )()( ff kt

f

kt

BridgeBoost

K

k

kt

ktt f

Kf

1

)()( )(1

)()c( xx

Kkkk

t niiw 1)( }1:)({

76

Learning flow

Stage t : )( )()()1()1(1)(

Kt

K

tttKt fff

)()1( xtf1D

KD

2D )()2( xtf

)()( xKtf

)1(t

)2(t

)(Kt

)}({ )1( iwt

)}({ )2( iwt

})({ )( iw Kt

D

)}({ )1(1 iwt

)}({ )2(1 iwt

})({ )(1 iw K

t

)()( xtf

Stage t+1 :

77

Mean exponential loss

kn

i

ki

ki

kk Fy

nFL

1

)()( )}(exp{1

)( xExponential loss

)(minarglog 1)()(

)(

)()(

)(

)(121

tk

tk

tktt

kttk

t FfLf

f

K

kk FLFL

1

)()(Mean exponential loss

)()(1

)( 11

)()(11

t

K

k

kt

kttktt FLfFL

KfFL

Note: convexity of Expo-Loss

78

Meta-leaning

validatory-crossis)( 1

)( t

kth FfLkh

kk

thh DfDL onis;onis)( )(

kh

tk

thtk

tk

K

ht

kth FfLFfLFfL )()()( 1

)(1

)(

11

)(

Separate learning Meta-learning

79

Simulation

Collapsed dataset

Traning error Test error

3 datasets

},,{ 321 DDD

21 , DD

3D

Test error 0 （ ideal ）

Test error 0.5 （ ideal ）

50,50,50,100 321 nnnp

data 1, data2

data3

80

Comparison

Separate AdaBoost BridgeBoost

Training error Training errorTest error Test error

81

Min =15% Min =4%Min =43%Min = 3% Min = 4%

Collapsed AdaBoost Separate AdaBoost BridgeBoost

Test errors

82

Conclusion

1D

2D

KD

)( 1Df

)( 2Df

)( KDf

…. ….

)|( 11 DDf

)|( 22 DDf

)|( KK DDf

….result

SeparateLeaning

Meta-leaning

83

Unsolved problems in BridgeBoost

3. On the information on the unmatched genes in combining datasets

2. Prediction for class-label for a given new x ?

4. Heterogeneity is OK, but publication bias?

1. Which dataset should be joined or deleted in BridgeBoost ?

84

Markov random field

　　Markov random field (MRF)

ICM algorithm (Besag, 1986)

85

Neighbor sites

86

Space AdaBoost

1. Estimate the posterior p(k | x) using only non-contextual information.

2. Extract the contextual information based on the estimated posterior p(k | x)

3. Make a hierarchical set of weak machines, and start the algorithm from the noncontextual result.

})|({ ikpF rr

87

Analysis comparison

True Image2

1r

True labels

20r

20r

22r

24r

217r

88

LocalBoost

)}(exp{))(,(1

),( )()(1

local ii

n

ih FyiK

nhFL ss xss

Local exponential loss ：

Let S be the sets of possible sites and we obtain training dataset over S :

89

Algorithm

,)())(()I(),(1

iwfy,Kf t

n

iiithtt

xsss

),(

),(1

21

)(

)(log)b(ttt

tttt

t

t

f

f

s

s

s

s

))(exp()()()c( )(1 iitttt yfiwiwt

xs

),(minarg)a( )( ttFf

t fft

ss

},...,1 :)({ from sampledunformly is niit ss

Note: the selected machine has local information around

ttf s)(

ts

90

Predicion rule )),(sgn(),( sxsx Fh

T

tttth t

fKF1

)( )(),(),( xsssx s

Locally weited combine

We use the second weighting for strongly combining only classification machines with near s.ttf s)( ts

91

LocalBoost vs. AdaBoost

92


　

Bayes rule

Neyman-Pearson Lemma

Model-misspecification

ROC curve

93

Error rate

　Two types of errors （ misclassification probabilities ）

False Negative

False Positive

u is a cut-off point ．

True Negative

True Positive False Positive

False Negative

94

Neyman-Pearson Lemma (1)

Null hypothesis Alternative

Log-likelihood raio

Neyman-Pearson lemma

95

Neyman-Pearson Lemma (2)

Bayes rule

96

Loss functions by NP lemma

where

97

A class of loss functions

Exponential

Log loss

98

Bayes rule equivalence

Theorem １

Equality holds if and only if

99

Error rate

　Error rate

where H(s) は Heaviside function,

is of the class

In general

100

Important examples

Credit scoring

Medical screening

101

Minimum Exponential Loss

Empirical exp loss

Expected exp loss

Theorem.

102

Variational discussion

103

ROC curve

Gini index

A

Area Under the Curve

104

ROC analysis

　　　

FP

TP

105

Logistic type discriminant

TFor a given traning dataset

For a given function

suggests the decision rule

the empirical loss is

106

Log loss

Conditionsal expected likelihood

where

107

Estimation equation

IRLS (Iteratively reweighted least squares)

where

empirical

expected

logistic

glm(formula, family = binomial, data, weights =W, ・・・ )

108

Fisher consistency

If the distribution of (x, y) isTheorem 3 ．

is asymptotically consistent for

109

Asymptotic efficiency

)()()(1

)ˆ(var 11A ββββ JVJn

Cramer-Rao type innequality gives

1A })()(}1({E1

)ˆ(var

Tppn

xfxfβ

Asymptotic variance

Equality holds if and only if

Or equivalently the logistic regression ．

110

Expected loss under parametrics

)}ˆ({I),ˆ(Risk ββ ULU

Under the parametric assumption

Expected D-loss

*),()1

(),ˆ(Risk),ˆ(Risk log* UUn

oUUU ββ

　

111

Expected loss under misspecified model

)()()( 2

1

nOT xfβx

)'( argmin *'

* βββ

UU L

Under a near parametric settings

)o(

)}Hesse()ˆ(vartr{)(),ˆ(Risk1

*A

** 21

n

LDU UUUU βββ

where

Then for

112

Sampling scheme

　

Mixture sampling

Separatesampling

Conditinal samling (Cohort study, prospective study)

(case-control study, retrospective study

113

Adams, N.M. nd Hand, D.J. (1999). Comparing classifiers when the misclassification costs are uncertain. Pattern Recognition 32, 1139-1147.

Adams, N.M. and Hand, D.J. (2000). Improving the practice of classifier performance assessment. Neural Computation 12, 305-311.

Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancer with applications to melanoma.

J. Amer. Statist. Assoc. 93, 415-426.

Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23.

Bishop, C. (1995),. Neural Networks for Pattern Recognition, Clarendon Press, Oxford.

114

Domingo, C and Watanabe, O. (2000). MadaBoost: A modification of AdaBoost.In Proc. of the 13th Conference on Computational Learning Theory.

Efron, B. (1975), The efficiency of logistic regression compared to normal discriminant analysis. J. Amer. Statist. Asoc.70, 892-898.

Friedman, J., Hastie, T. and Tibishirani, R. (2000). Additive logistic regression: A statitistical view of boosting. Ann. Statist. 28, 337-407.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188.

Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541.

Ryuei Nishii and Shinto Eguchi (2004). Supervised Image Classification by ContextualAdaBoost Based on Posteriors in Neighborhoods To be submitted.

Hastie, T. Tibishirani, R. and Friedman J. (2001). The elements of statistical learning. Springer, New York.

115

Lebanon, G. and Lafftry, J. (2001). Boosting and maximum likelihood for exponential models. NIPS, 14, 2001.

McLachlan, G. J. (1992). Discriminant analysis and statistical pattern recognition. Wiley, New York.

Pepe. M.S. and Thampson, M.L. (2000). Combing diagnostic test results to increase accuracy. Biostatistics 1, 123-140.

Ratsch, G., Onoda, T. and Muller K.-R. (2001) Soft Margins for AdaBoost. Machine Learning. 42(3)}, 287-320.

Schapire, R. (1990). The strength of the weak learnability. Machine Learning 5, 197-227.

Schapire, R. Freund, Y, Bartlett, P. and Lee, W. (1998). Boosting the margin: a new

explanation for effectiveness of voting methods. Ann. Statist., 26, 1651-1686.

Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.

116

Eguchi, S. and Copas, J. (2001). Recent developments in discriminant analysis from an information geometric point of view. J. Korean Statist. Soc. 30, 247-264 (2001). (The special issue of the 30th aniversary of the Korean Statist. Soc)

Eguchi, S. and J. Copas . A class of logistic-type discriminant functions. Biometrika 89, 1-22 (2002).

Eguchi, S. (2002) U-boosting Method for Classification and Information Geometry. Invited talk on International Statistical Workshop, Statistical Research Center for Complex System, Seoul Natinal University

Kanamori, T. Takenouchi, S. Eguchi and N. Murata (2004). The most robust loss function for boosting. Lecture Notes in Computer Science 3316, 496-501, Springer.

Murata, N., T. Takenouchi, T. Kanamori and S. Eguchi , Information geometry of U-Boost and Bregman divergence. Neural Computation 16, 1437-1481 (2004).

117

Takenouchi, T. and S. Eguchi. Robustifying AdaBoost by adding the naive error rate. Neural Computation 16, 767-787 (2004).

N. Murata, T. Takenouchi, T. Kanamori, S. Eguchi. Geometry of U-Boost algorithms. 林原フォーラム「偶然と必然－数理、情報、経済」分科会「情報の物理学」

江口真透 (2004). 情報幾何と統計的パタン認識 , 　数学論説

江口真透 (2004). 統計的パタン識別の情報幾何－ U ブースト学習アルゴリズム－数理科学 No. 489, 53-59

江口真透 (2002). 統計的識別の方法について－ロジスティック判別からアダブーストまで－ . 応用統計学会第 24 回シンポジウムプログラム－多変量解析の新展開－特別講演

118

Future paradigm

The concept of learning will offer highly productive ideas for computational algorithms also in future.

Is it an imitation of biological brain?

Meta learning?

Human life?

And Statistics?

And Statistics?

2004 年 11 月 24 日（水）～ 26 日（金）

Documents