2004 年 11 月 24 日(水)~ 26 日(金)
DESCRIPTION
2004 Open Lecture at ISM Recent topics in machine learning: Boosting. 2004 年 11 月 24 日(水)~ 26 日(金). 公開講座 統計数理要論 「機械学習の最近の話題」. ブースト学習. 江口 真透. (統計数理研究所 , 総合研究大学院統計科学). 講座内容. ブースト学習 : 統計的パタン認識の手法であるアダブーストを 概説し、 その長所と欠点について考察します . 遺伝子発現、リモートセンシング・データ などの適用例の紹介をします。. - PowerPoint PPT PresentationTRANSCRIPT
1
2004 年 11 月 24 日(水)~ 26 日(金)
2004 Open Lecture at ISM Recent topics in machine learning: Boosting
公開講座 統計数理要論「機械学習の最近の話題」
ブースト学習
江口 真透
(統計数理研究所 , 総合研究大学院統計科学)
2
講座内容
ブースト学習 :
統計的パタン認識の手法であるアダブーストを概説し、その長所と欠点について考察します.遺伝子発現、リモートセンシング・データなどの適用例の紹介をします。
3
Boost Leaning (I)
10:00-12:30 November 25 Thu
Boost learning algorithm
AdaBoost
EtaBoost
GroupBoost
AsymAdaBoost AsymLearning
Robust learning
Group learning
4
Boost Leaning (II)
13:30-16:00 November 26 Fri
Statistical discussion
Optimal classifier by AdaBoost
Probablistic framework
Bayes Rule, Fisher’s LDF, Logistic regression
BridgeBoost
LocalBoost
Meta learning
Local learning
5
謝辞
この講座で紹介された内容の多くは,以下の 共同研究者の方々との成果を含む.ここに感謝
する. 村田 昇氏(早稲田大学理工学) 西井龍映氏(九州大学数理学) 金森敬文氏(東京工大,情報数理) 竹之内高志氏(統計数理研究所) 川喜田正則君(総研大,統計科学) John B. Copas (Dept Stats, Univ of Warwick)
6
The strength of the weak learnability.
Strong learnability If, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances.
Weak learnability The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.
Schapire, R. (1990)
7
Web-page on Boost
Boosting Research Site: http: //www.boosting.org/
Robert Schapire’s home page:http://www.cs.princeton.edu/~schapire/
Yoav Freund's home page :http://www1.cs.columbia.edu/~freund/
John Laffertyhttp://www-2.cs.cmu.edu/~lafferty/
8
Character, Image, Speaker, Signal, Face, Language,…
Recognition for
Statistical pattern recognition
Prediction for
Weather, earthquake, disaster, finance, interest rates, company
bankruptcy, credit, default, infection, disease, adverse effect
Classification for
Species, parentage, genomic type, gene expression,
protein expression, system failure, machine trouble
9
Multi-class classification
Feature vector Class label
Discriminant function
Classification rule
10
Binary classification
Classification rule
Learn a training dataset
Make a classification
label
0-normalization
11
Statistical learning theory
Boost learning
Boost by filter (Schapire, 1990)
Bagging, Arching ( bootstrap ) (Breiman, Friedman, Hasite)
AdaBoost (Schapire, Freund, Batrlett, Lee)
Support vector
Maximize margin Kernel space
(Vapnik, Sholkopf)
12
Class of weak machines
Stamp class
Linear class
ANN class SVM class kNN class
Point: colorful character rather than universal character
13
AdaBoost
0)(),1()(: settings Initial.1 01
1 xFniiw
n
,)'(
)())((I)(
iw
iwfyf
t
tiit x
)(
)(1
21
)(
)(log)b(tt
ttt
f
f
T
tttTT fFF
1)( )()( where,)(sign.3 )( xxx
Tt ,,1For .2
))(exp()()()c( )(1 iitttt yfiwiw x
)(min)()a( )( ff tf
tt F
14
Learning algorithm
Data
)(,),1( 11 nww
)(,),1( 22 nww 1
2
1
2
T
)()1( xf
T
1)( )(
ttt f x
)(,),1( nww TT
)()2( xf
)()( xTf
Final machine
T
tttTT fFF
1)( )()(where,)(sign )( xxx
15
Simulation (complete separation)
-1 -0.5 0.5 1
-1
-0.5
0.5
1
[-1,1]×[-1,1]
Feature space
Decision boundary
16
-1 -0.5 0.5 1
-1
-0.5
0.5
1
Set of weak machines
Linear classification machines
Random generation
17
Learning process (I)
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10
Iter = 23, train err = 0.10 Iter = 31, train err = 0.095 Iter = 47, train err = 0.08
18
Learning process (II)
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Iter = 55, train err = 0.061
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Iter = 99, train err = 0.032
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Iter = 155, train err = 0.016
19
Final stage
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Contour of F(x) Sign(F(x))
20
Learning curve
50 100 150 200 250
0.05
0.1
0.15
0.2
Iter = 1,…..,277
Train
ing
error
21
Characteristics
2
1)( )(1 tt f ( least favorable )
)()( 1 iwiw tt
t
t
eyf
eyf
iit
iit
ctormultiplifa)(
ctor multiplifa)(
)(
)(
x
x
Update
Weighted error rates )()()( )1(1)(1)( tttttt fff
22
2
1)( )(1 tt f
)'(
)())(()(
1
1
1)(1 iw
iwyfIf
t
tn
iiittt x
n
ititit
n
itititiit
iwfy
iwfyyfI
1)(
1)()(
)()}(exp{
)()}(exp{))((
x
xx
n
itiitt
n
itiitt
n
itiitt
iwyfIiwyfI
iwyfI
1)(
1)(
1)(
)())((}exp{)())((}exp{
)())((}exp{
xx
x
2
1
)}(1{)(1
)()(
)(
)(1
)()(
)(1
)()(
)()(
)(
)(
)()(
)(
tttt
tttt
tt
tt
tttt
tt
ff
ff
f
f
ff
f
23
Exponential loss
)}(exp{1
)(1
exp ii
n
i
Fyn
FL x
)()()( xxx fFF
}))(1()(){(exp fefeFL
Update by
Exponential loss :
n
iiiiiii yfeyfeFy
n 1
))(I())(I()}(exp{1
xxx
n
iiiii fyFy
nfFL
1exp )}(exp{)}(exp{
1)( xx
24
Sequential minimization
}))(1()(){()( expexp efefFLfFL
)(
)(1log
2
1opt f
f
)(
)}(exp{))(()(
exp
1
FL
FyyfIf
n
iiiij
xxwhere
)}(1){(2 ff
)}(1){(2)()(1
2
ffefe
f
Equality holds iff
efef ))(1()(
25
AdaBoost = minimum exp-loss
)(minarg)a( )( ff tFf
t
)(
)(1log
2
1 (b)
)(
)(
tt
ttt f
f
)}(exp{)()((c) *1 itittt xfyiwiw
)}(1){()()(min )()(1exp)(1exp ttttt ffFLfFL
R
)(
)(1log
2
1
)(
)(opt
t
t
f
f
26
Simulation (complete random)
-1 -0.5 0.5 1
-1
-0.5
0.5
1
27
Overlearning of AdaBoost
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Iter = 51, train err = 0.21
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Iter = 151, train err = 0.06 Iter =301, train err = 0.0
28
Drawbacks of AdaBoost
1. Unbalancing learning
2. Over-learning even for noisy dataset
EtaBoost
GroupBoost
BridgeBoost
AsymAdaBoost
LocalBoost
Robustfy mislabelled examples
Relax the p >> n problems
Extract spatial information
Combine different datasets
Balancing the false n/ps’
29
AsymBoost
)(minarg 1**
ttasymt FfL
R
The small modification of AdaBoost into
2 (b)’
The selection of k
The default choice is
n
ii
n
ii
yI
yIk
1
1
)1(
)1(
30
Weighted errors by k
31
32
33
34
Result of AsymBoost
35
Eta-loss function
regularized
36
EtaBoost
0)(),1()(: settings Initial.1 01
1 xFniiw
n
,)())((I)(1
iwfyf m
n
iiim
x
T
tttTT fFF
1)( )()( where,)(sign.3 )( xxx
Tm ,,1For .2
))(exp()()()c( )(*
1 iimmmm yfiwiw x
)(min)()a( )( ff mf
mm
(b)
37
A toy example
38
AdaBoost vs Eta-Boost
39
Simulation (complete random)
-1 -0.5 0.5 1
-1
-0.5
0.5
1
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Iter = 51, train err =
0.21
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Iter =301, train err =
0.0Overlearning of AdaBoost
40
EtaBoost
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Iter = 51, train err = 0.25 Iter = 51, train err = 0.15 Iter =351, train err = 0.18
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
41
Mis-labeled examples
Mis-labeled
42
Comparison
AdaBoost EtaBoost
43
GroupBoost
Relax over-learning of AdaBoost by group learning
Idea: In AdaBoost 2 (a)
The best machine is singly selected
Other better machines are cast off.
Is there any wise way of grouping G best macines?
44
Grouping machines
where),(minarg);(Let fbf tjjtjFf
G
gggtgtt bxf
Gf
1)()()(, );(
1)( x
}1,1{,RI:)sgn( abbxaF jj
);(,),;(setaSelect )()()1()1( ggtt bfbf
));(());((thatsuch )()()1()1( ggtttt bfbf
45
GroupBoost
)())((I)()a(1
iwyff t
N
iiit
x
11
))(exp()()((c)
t
ittt Z
yfiwiw
x
);(,),;( )()()1()1( GGtt bxfbxf
).,,1());((
));((1log
2
1)b(
)(
)(),( Gg
bf
bf
gtt
gttgt
.);(1
)(1
)()(),(
G
gggtgt bxf
Gf x
46
0F
11 f
22 f
33 f
44 f
1f
2f
3f
4f
Grouping jumps for the next
47
Learning archtecture
Data
)(,),1( 11 nww
)(,),1( 22 nww
1
2
T
),1()1,1(
),1()1,1(
,,
)(,),(
G
Gff
xx
T
ttf
1)( )(x
)(,),1( nww TT
Grouping G machines )( ),(),()1,()1,(1
)( GtGtttGt fff
)()1( xf
)()2( xf
)()( xTf
),2()1,2(
),2()1,2(
,,
)(,),(
G
Gff
xx
),()1,(
),()1,(
,,
)(,),(
GTT
GTT ff
xx
48
AdaBoost and GroupBoost
ostGroupAdaBo・
AdaBoost・ )()( exp11exp tttt FLfFL
)()()( exp1exp11exp tttttt FLfFLfFL
)exp(:AdaBoost t
)exp(:ostGroupAdaBo1
,
G
ggt
Update the weights
49
From microarray
Contest program from bioinformatics
(BIP2003)
http://contest.genome.ad.jp/
Microarray data
Number of genes p = 1000 ~ 100000
Size of individuals n = 10 ~ 100
50
Output
http://genome-www.stanford.edu/cellcycle/
51
Goal
Disease and gene expressions
y is a label for clinical infomation
x = (x , …, x ) = feature vector1 p
i x = log
The genen expression for i-th individual is observed
n = 27+11 = 38 << p = 7109(BIP2003) Problem2 :
52
Microarray data
cDNA microarry
53
Prediction from gene expressions
),( 1 pxx x
}1,1{ y
yf x:
Feature vector dimension = number of genes p components = quantities of gene expression
Class label disease, adverse effect
Classification machine
based on training dataset }1:),({ niyii x
54
Leukemic diseases, Golub et alhttp://www.broad.mit.edu/cgi-bin/cancer/publications/
55
AdaBoost
0)(),1()(: settings Initial.1 01
1 xFniiw
n
,)())((I)(1
iwfyf t
n
iiit
x
)(
)(1
21
)(
)(log)b(tt
ttt
f
f
T
tttTT fFF
1)( )()( where,)(sign.3 )( xxx
Tt ,,1For .2
))(exp()()()c( )(1 iitttt yfiwiw x
)(min)()a( )( ff tf
tt
56
One-gene classifier
Error number 5 5 5 6 466 5 565
})(sgn({minarg i
ijij bxyIbb
one-gene classifier
jj
jjj bx
bxf
if1
if1)(x
jb
Let be expressions of the j-th genenjj xx ,...,1
jx
57
The second training
Errror number 45.5 7 9
87.56
798.5
Update the weight:
jx
4.5
jb
Weight up to 2
})(sgn()({minarg i
ijij bxyIiwbb
2log4
16log5.0
ans. false of nb.
ans.correct of nb.log5.01
Weight down to 0.5
jb
58
Web microarray data
p n y = +1 y = - 1
ALLAML 7129 72 37 35
Colon 2000 62 40 22
Estrogen 7129 49 25 24
p >> n
http://microarray.princeton.edu/oncology/http://mgm.duke.edu/genome/dna micro/work/
59
10-fold validation
N
iii yD 1),( x
1 2 3 4 109
Validation Training
)1(
Validation Training
)2(
Validation Training
)3(
)(10
1byAveraing
10
1
kk
60
ALLAML
Training error 10-fold CV error
61
Colon
Training error 10-fold CV error
62
Estrogen
Training error 10-fold CV error
63
Gene score
T
t
G
gggtgt
T
tT bxffF
1 1)()()(,
1
);()()( xx・
gene th-forfactorConfidence j
T
t
G
ggtjgI
TjS
1 1),())((
1)(
S(j) suggests the degree of association with the class-label
64
Genes associated with disease
Compare our result with the set of genes suggested
by Golub et al. , West et al.
AdaBoost can detect only 15 genes in
the case of ALLAML,.
GroupAdaBoostde detect 30 genes compatible with
Their results
65
Boost Leaning (II)
13:30-16:00 November 26 Fri
Statistical discussion
Optimal classifier by AdaBoost
Probablistic framework
Bayes Rule, Fisher’s LDF, Logistic regression
BridgeBoost p >> n problem
LocalBoost contextual information
66
Problem: p >> n
Fundamental issue on Bioinformatics
p is the dimension of biomarker
(SNPs, proteome, microarray, …)
n is the number of individuals
(informed consent, institutional protocol, …bioethics)
67
An approach by combining
Let B be a biomarker space
Rapid expansion of genomic data
},,1:)({ kik niD z
pnKK
kk
1
larger
Let be K experimental facilitiesKII ,,1
),...,1( Kknp k
68
BridgeBoost
1D
CAMDA (Critical Assessment of Microarray Data Analysis )
DDBJ (DNA Data Bank Japan, NIG)
2D
KD
)( 1Df
)( 2Df
)( KDf
…. ….
)|( 11 DDf
)|( 22 DDf
)|( KK DDf
…. result
69
CAMDA 2003
4 datasets for Lung Cancer
Harvard PNAS, 2001 Affymetrix
Michigan Nature Med,
2002
Affymetrix
Stanford PNAS, 2001 cDNA
Ontario Cancer Res
2001
cDNA
http://www.camda.duke.edu/camda03/datasets/
70
Some problems
1. Heterogeneity in feature space
cDNA, Affymetrix
Differences in covariates , medical diagnosis
Uncertainty for microarray experiments
2. Heterogeneous class-labeling
3. Heterogeneous generalization powers
A vast of unpublished studies
4. Publication bias
71
Machine learning
Leanability: boosting weak learners?
AdaBoost : Freund & Schapire (1997)
weak classifiers
})(,....),({ 1 xx pff
A strong classifier
)()( )()1(1 xx tt ff
)(xf
)()1(1 xfstagewise
72
Learning algorithm
D
)(,),1( 11 nww
)(,),1( 22 nww 1
2
1
2
T
)()1( xf
T
1)( )(
ttt f x
)(,),1( nww TT
)()2( xf
)()( xTf
Final machine
T
tttTT fFF
1)( )()(where,)(sign )( xxx
73
Different datasets
}1:),({where )()(k
ki
kik niyD x
K
kkDD
1
Normalization: ]1,0[ RI )()( pki
ki
p xx
),...,1(minmax
min)()(
)()(
)( pjxx
xxx
kji
i
kji
i
kji
i
kjik
ji
∋
)(
)( RIk
i
pki
y
x expression vector of the same genes
label of the same clinical item
74
Weighted Errors
))((I)(1
)()(
1
)(
)()(
)(
)(
k
k
n
i
ki
kin
i
k
t
k
tkt fyf
iw
iwx
K
k
kt
kt ff
1
)()( )()(
The k-th weighted error
The combined weighted error
K
h
n
i
h
t
n
i
k
tk
h
k
iw
iw
1 1
)(
1
)(
)(
)(
)(
where
75
)(
)(121
)(
)(
log)b( )(k
k
tt
ttkt
f
f
))(exp()()()d( )()()()()(1
ki
kit
kt
kt
kt yfiwiw x
)(minarg(a) )()( ff kt
f
kt
BridgeBoost
K
k
kt
ktt f
Kf
1
)()( )(1
)()c( xx
Kkkk
t niiw 1)( }1:)({
76
Learning flow
Stage t : )( )()()1()1(1)(
Kt
K
tttKt fff
)()1( xtf1D
KD
2D )()2( xtf
)()( xKtf
)1(t
)2(t
)(Kt
)}({ )1( iwt
)}({ )2( iwt
})({ )( iw Kt
D
)}({ )1(1 iwt
)}({ )2(1 iwt
})({ )(1 iw K
t
)()( xtf
Stage t+1 :
77
Mean exponential loss
kn
i
ki
ki
kk Fy
nFL
1
)()( )}(exp{1
)( xExponential loss
)(minarglog 1)()(
)(
)()(
)(
)(121
tk
tk
tktt
kttk
t FfLf
f
K
kk FLFL
1
)()(Mean exponential loss
)()(1
)( 11
)()(11
t
K
k
kt
kttktt FLfFL
KfFL
Note: convexity of Expo-Loss
78
Meta-leaning
validatory-crossis)( 1
)( t
kth FfLkh
kk
thh DfDL onis;onis)( )(
kh
tk
thtk
tk
K
ht
kth FfLFfLFfL )()()( 1
)(1
)(
11
)(
Separate learning Meta-learning
79
Simulation
Collapsed dataset
Traning error Test error
3 datasets
},,{ 321 DDD
21 , DD
3D
Test error 0 ( ideal )
Test error 0.5 ( ideal )
50,50,50,100 321 nnnp
data 1, data2
data3
80
Comparison
Separate AdaBoost BridgeBoost
Training error Training errorTest error Test error
81
Min =15% Min =4%Min =43%Min = 3% Min = 4%
Collapsed AdaBoost Separate AdaBoost BridgeBoost
Test errors
82
Conclusion
1D
2D
KD
)( 1Df
)( 2Df
)( KDf
…. ….
)|( 11 DDf
)|( 22 DDf
)|( KK DDf
….result
SeparateLeaning
Meta-leaning
83
Unsolved problems in BridgeBoost
3. On the information on the unmatched genes in combining datasets
2. Prediction for class-label for a given new x ?
4. Heterogeneity is OK, but publication bias?
1. Which dataset should be joined or deleted in BridgeBoost ?
84
Markov random field
Markov random field (MRF)
ICM algorithm (Besag, 1986)
85
Neighbor sites
86
Space AdaBoost
1. Estimate the posterior p(k | x) using only non-contextual information.
2. Extract the contextual information based on the estimated posterior p(k | x)
3. Make a hierarchical set of weak machines, and start the algorithm from the noncontextual result.
})|({ ikpF rr
87
Analysis comparison
True Image2
1r
True labels
20r
20r
22r
24r
217r
88
LocalBoost
)}(exp{))(,(1
),( )()(1
local ii
n
ih FyiK
nhFL ss xss
Local exponential loss :
Let S be the sets of possible sites and we obtain training dataset over S :
89
Algorithm
,)())(()I(),(1
iwfy,Kf t
n
iiithtt
xsss
),(
),(1
21
)(
)(log)b(ttt
tttt
t
t
f
f
s
s
s
s
))(exp()()()c( )(1 iitttt yfiwiwt
xs
),(minarg)a( )( ttFf
t fft
ss
},...,1 :)({ from sampledunformly is niit ss
Note: the selected machine has local information around
ttf s)(
ts
90
Predicion rule )),(sgn(),( sxsx Fh
T
tttth t
fKF1
)( )(),(),( xsssx s
Locally weited combine
We use the second weighting for strongly combining only classification machines with near s.ttf s)( ts
91
LocalBoost vs. AdaBoost
92
Statistical discussion
Bayes rule
Neyman-Pearson Lemma
Model-misspecification
ROC curve
93
Error rate
Two types of errors ( misclassification probabilities )
False Negative
False Positive
u is a cut-off point .
True Negative
True Positive False Positive
False Negative
94
Neyman-Pearson Lemma (1)
Null hypothesis Alternative
Log-likelihood raio
Neyman-Pearson lemma
95
Neyman-Pearson Lemma (2)
Bayes rule
96
Loss functions by NP lemma
where
97
A class of loss functions
Exponential
Log loss
98
Bayes rule equivalence
Theorem 1
Equality holds if and only if
99
Error rate
Error rate
where H(s) は Heaviside function,
is of the class
In general
100
Important examples
Credit scoring
Medical screening
101
Minimum Exponential Loss
Empirical exp loss
Expected exp loss
Theorem.
102
Variational discussion
103
ROC curve
Gini index
A
Area Under the Curve
104
ROC analysis
FP
TP
105
Logistic type discriminant
TFor a given traning dataset
For a given function
suggests the decision rule
the empirical loss is
106
Log loss
Conditionsal expected likelihood
where
107
Estimation equation
IRLS (Iteratively reweighted least squares)
where
empirical
expected
logistic
glm(formula, family = binomial, data, weights =W, ・・・ )
108
Fisher consistency
If the distribution of (x, y) isTheorem 3 .
is asymptotically consistent for
109
Asymptotic efficiency
)()()(1
)ˆ(var 11A ββββ JVJn
Cramer-Rao type innequality gives
1A })()(}1({E1
)ˆ(var
Tppn
xfxfβ
Asymptotic variance
Equality holds if and only if
Or equivalently the logistic regression .
110
Expected loss under parametrics
)}ˆ({I),ˆ(Risk ββ ULU
Under the parametric assumption
Expected D-loss
*),()1
(),ˆ(Risk),ˆ(Risk log* UUn
oUUU ββ
111
Expected loss under misspecified model
)()()( 2
1
nOT xfβx
)'( argmin *'
* βββ
UU L
Under a near parametric settings
)o(
)}Hesse()ˆ(vartr{)(),ˆ(Risk1
*A
** 21
n
LDU UUUU βββ
where
Then for
112
Sampling scheme
Mixture sampling
Separatesampling
Conditinal samling (Cohort study, prospective study)
(case-control study, retrospective study
113
Adams, N.M. nd Hand, D.J. (1999). Comparing classifiers when the misclassification costs are uncertain. Pattern Recognition 32, 1139-1147.
Adams, N.M. and Hand, D.J. (2000). Improving the practice of classifier performance assessment. Neural Computation 12, 305-311.
Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancer with applications to melanoma.
J. Amer. Statist. Assoc. 93, 415-426.
Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23.
Bishop, C. (1995),. Neural Networks for Pattern Recognition, Clarendon Press, Oxford.
114
Domingo, C and Watanabe, O. (2000). MadaBoost: A modification of AdaBoost.In Proc. of the 13th Conference on Computational Learning Theory.
Efron, B. (1975), The efficiency of logistic regression compared to normal discriminant analysis. J. Amer. Statist. Asoc.70, 892-898.
Friedman, J., Hastie, T. and Tibishirani, R. (2000). Additive logistic regression: A statitistical view of boosting. Ann. Statist. 28, 337-407.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188.
Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541.
Ryuei Nishii and Shinto Eguchi (2004). Supervised Image Classification by ContextualAdaBoost Based on Posteriors in Neighborhoods To be submitted.
Hastie, T. Tibishirani, R. and Friedman J. (2001). The elements of statistical learning. Springer, New York.
115
Lebanon, G. and Lafftry, J. (2001). Boosting and maximum likelihood for exponential models. NIPS, 14, 2001.
McLachlan, G. J. (1992). Discriminant analysis and statistical pattern recognition. Wiley, New York.
Pepe. M.S. and Thampson, M.L. (2000). Combing diagnostic test results to increase accuracy. Biostatistics 1, 123-140.
Ratsch, G., Onoda, T. and Muller K.-R. (2001) Soft Margins for AdaBoost. Machine Learning. 42(3)}, 287-320.
Schapire, R. (1990). The strength of the weak learnability. Machine Learning 5, 197-227.
Schapire, R. Freund, Y, Bartlett, P. and Lee, W. (1998). Boosting the margin: a new
explanation for effectiveness of voting methods. Ann. Statist., 26, 1651-1686.
Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.
116
Eguchi, S. and Copas, J. (2001). Recent developments in discriminant analysis from an information geometric point of view. J. Korean Statist. Soc. 30, 247-264 (2001). (The special issue of the 30th aniversary of the Korean Statist. Soc)
Eguchi, S. and J. Copas . A class of logistic-type discriminant functions. Biometrika 89, 1-22 (2002).
Eguchi, S. (2002) U-boosting Method for Classification and Information Geometry. Invited talk on International Statistical Workshop, Statistical Research Center for Complex System, Seoul Natinal University
Kanamori, T. Takenouchi, S. Eguchi and N. Murata (2004). The most robust loss function for boosting. Lecture Notes in Computer Science 3316, 496-501, Springer.
Murata, N., T. Takenouchi, T. Kanamori and S. Eguchi , Information geometry of U-Boost and Bregman divergence. Neural Computation 16, 1437-1481 (2004).
117
Takenouchi, T. and S. Eguchi. Robustifying AdaBoost by adding the naive error rate. Neural Computation 16, 767-787 (2004).
N. Murata, T. Takenouchi, T. Kanamori, S. Eguchi. Geometry of U-Boost algorithms. 林原フォーラム「偶然と必然-数理、情報、経済」分科会「情報の物理学」
江口真透 (2004). 情報幾何と統計的パタン認識 , 数学論説
江口真透 (2004). 統計的パタン識別の情報幾何 - U ブースト学習アルゴリズム- 数理科学 No. 489, 53-59
江口真透 (2002). 統計的識別の方法について - ロジスティック判別からアダブーストまで - . 応用統計学会 第 24 回 シンポジウムプログラム-多変量解析の新展開- 特別講演
118
Future paradigm
The concept of learning will offer highly productive ideas for computational algorithms also in future.
Is it an imitation of biological brain?
Meta learning?
Human life?
And Statistics?
And Statistics?