data mining classification: alternative techniques 第 5 章 分類技術
TRANSCRIPT
![Page 1: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/1.jpg)
Data Mining Classification: Alternative
Techniques
第 5 章 分類技術
![Page 2: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/2.jpg)
2
Bayes Classifier
1. 貝氏理論2. 利用貝氏理論分類3. 單純貝氏分類法( Naïve Bayes )4. 貝氏信念網路
( Bayesian belief network , BBN)
![Page 3: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/3.jpg)
3
某一事件發生的機率常受其他相關事件是否發生影響,某一事件發生的機率常受其他相關事件是否發生影響,稱為條件機率稱為條件機率 (conditional probability)(conditional probability) 記作記作 P P ((A A | | BB)) 。。某一事件發生的機率常受其他相關事件是否發生影響,某一事件發生的機率常受其他相關事件是否發生影響,稱為條件機率稱為條件機率 (conditional probability)(conditional probability) 記作記作 P P ((A A | | BB)) 。。
條件機率:條件機率:
或或
條件機率:條件機率:
或或
( )( | )
( )P A B
P A BP B
( )
( | )( )
P A BP A B
P B
條件機率 & 乘法律
AP
BAPABP
|
AP
BAPABP
|
乘法律乘法律 (multiplication law)(multiplication law) 用來計算兩事件交集的機率用來計算兩事件交集的機率
及及
乘法律乘法律 (multiplication law)(multiplication law) 用來計算兩事件交集的機率用來計算兩事件交集的機率
及及PP((AA ∩∩BB) = ) = PP((BB))PP((A A | | BB)) PP((AA ∩∩BB) = ) = PP((AA))PP((B B | | AA))
![Page 4: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/4.jpg)
4一般形式的貝氏定理
若事件若事件 BB11,B,B22,…,B,…,Bkk 構成樣本空間構成樣本空間 SS 的一組分割,的一組分割,且且 PP((BBii)≠0,)≠0,ii=1,2,…,=1,2,…,kk ;則對;則對 SS 中的任一事件中的任一事件 PP((AA)≠0)≠0 而言,而言,若事件若事件 BB11,B,B22,…,B,…,Bkk 構成樣本空間構成樣本空間 SS 的一組分割,的一組分割,且且 PP((BBii)≠0,)≠0,ii=1,2,…,=1,2,…,kk ;則對;則對 SS 中的任一事件中的任一事件 PP((AA)≠0)≠0 而言,而言,
1 1 2 2
( ) ( )( )
( ) ( ) ( ) ( ) ( ) ( )i i
ik k
P B P A BP B A
P B P A B P B P A B P B P A B
![Page 5: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/5.jpg)
5
(1) (2) (3) (4) (5)
事件 事前機率 條件機率 聯合機率 事後機率Bi P(Bi) P(A|Bi) P(Bi∩A) P(Bi|A)
B1 0.2 0.05 0.010 0.010/0.032=0.3125
B2 0.3 0.04 0.012 0.012/0.032=0.3750
B3 0.5 0.02 0.010 0.010/0.032=0.3125
1.00 P(A)=0.032 1.000
貝氏定理的列表分析貝氏定理的列表分析
一般形式的貝氏定理例題
![Page 6: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/6.jpg)
6
如果事件 如果事件 AA 發生的機率不受事件發生的機率不受事件 BB 的影響,稱事件的影響,稱事件 AA 和和BB 為獨立事件為獨立事件 (independent events)(independent events) 。。如果事件 如果事件 AA 發生的機率不受事件發生的機率不受事件 BB 的影響,稱事件的影響,稱事件 AA 和和BB 為獨立事件為獨立事件 (independent events)(independent events) 。。
兩事件 兩事件 AA 和 和 BB 是獨立事件,則是獨立事件,則
且且
兩事件 兩事件 AA 和 和 BB 是獨立事件,則是獨立事件,則
且且PP((AA||BB) = ) = PP((AA))
獨立事件 / 條件獨立事件條件獨立事件
條件獨立事件條件獨立事件 (Conditional independence)(Conditional independence) ::在在 MM 事件發生的狀況下,事件發生的狀況下,A,B,CA,B,C 事件發生的機率不受彼此的影響事件發生的機率不受彼此的影響
條件獨立事件條件獨立事件 (Conditional independence)(Conditional independence) ::在在 MM 事件發生的狀況下,事件發生的狀況下,A,B,CA,B,C 事件發生的機率不受彼此的影響事件發生的機率不受彼此的影響
PP((AA ∩∩BB) = ) = PP((AA))PP((BB))
PP((AA ∩ ∩ BB ∩ ∩ C | MC | M)) = P = P((A|MA|M)) P P((B|MB|M)) P P((C|MC|M))
![Page 7: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/7.jpg)
7Bayes theorem
貝氏理論( Bayes theorem ),它是一個從資料當中結合類別知識的方法。
A probabilistic framework for solving classification problems
Conditional Probability:
Bayes theorem:
)()()|(
)|(AP
CPCAPACP
)(),(
)|(
)(),(
)|(
CPCAP
CAP
APCAP
ACP
![Page 8: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/8.jpg)
8Example of Bayes Theorem
Given: – 醫生知道,腦膜炎有 50 %的機率會導致頸部僵硬 – 先驗機率:任何病人患有腦膜炎的機率是 P(M)=1 /
50000
– 先驗機率:有任何病人頸部僵硬的機率是 P(S)=1 / 20
後驗機率:如果病人頸部僵硬,該病人患腦膜炎的機率為何 P(M|S) ?
0002.020/150000/15.0
)()()|(
)|( SP
MPMSPSMP
![Page 9: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/9.jpg)
9Bayesian Classifiers
Consider each attribute and class label as random variables
Given a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An )
Can we estimate P(C| A1, A2,…,An ) directly from data?
![Page 10: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/10.jpg)
10Bayesian Classifiers
Approach:
– compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem
– Choose value of C that maximizes P(C | A1, A2, …, An)
– Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C)
How to estimate P(A1, A2, …, An | C )?
)()()|(
)|(21
21
21
n
n
n AAAPCPCAAAP
AAACP
![Page 11: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/11.jpg)
11Naïve Bayes Classifier
Assume independence among attributes Ai when
class is given:
– P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
– Can estimate P(Ai| Cj) for all Ai and Cj.
– New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal.
![Page 12: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/12.jpg)
12How to Estimate Probabilities from Data?
Class: P(C) = Nc/N– e.g., P(No) = 7/10,
P(Yes) = 3/10
For discrete attributes:
P(Ai | Ck) = |Aik|/ Nck
– where |Aik| is number of instances having attribute Ai and belongs to class Ck
– Examples:
P(Status=Married|No) = 4/7P(Refund=Yes|Yes)=0
![Page 13: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/13.jpg)
13Example
P(O)=3/10 P(No|O)=1 P(Div|O)=1/3 P(Low|O)=1/3 P(X)=7/10 P(No|X)=4/7 P(Div|X)=1/7 P(Low|X)=3/7
W={No,Div,Low} Class(W)=O or X ?
Tid RefundMaritalStatus
TaxableIncome
Class
1 Yes Single Mid X
2 No Married Mid X
3 No Single Low X
4 Yes Married Mid X
5 No Divorced Mid O
6 No Married Low X
7 Yes Divorced High X
8 No Single Low O
9 No Married Low X
10 No Single Mid O
![Page 14: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/14.jpg)
14Example
P(O)=3/10 P(No|O)=1 P(Div|O)=1/3 P(Low|O)=1/3
Tid RefundMaritalStatus
TaxableIncome
Class
1 Yes Single Mid X
2 No Married Mid X
3 No Single Low X
4 Yes Married Mid X
5 No Divorced Mid O
6 No Married Low X
7 Yes Divorced High X
8 No Single Low O
9 No Married Low X
10 No Single Mid O
P(X)=7/10 P(No|X)=4/7 P(Div|X)=1/7 P(Low|X)=3/7
W={No,Div,Low} P(O|W)= P(No|O)P(Div|O)P(Low|O)P(O)/P(W)= 1/30/P(W) P(X|W)= P(No|X)P(Div|X)P(Low|X)P(X)/P(W)= 2/245/P(W) ∵P(O|W) > P(X|W) Class(W)=O∴
)(
)()|()|(
21
2121
n
nn AAAP
CPCAAAPAAACP
P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
![Page 15: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/15.jpg)
15Example of Naïve Bayes Classifier
Class
Give Birthmammal
s
non-mammal
s總計
no 1 12 13
yes 6 1 7
總計 7 13 20
Lay Eggsmammal
s
non-mammal
s總計
no 6 1 7
yes 1 12 13
總計 7 13 20
Can Flymammal
s
non-mammal
s總計
no 6 10 16
yes 1 3 4
總計 7 13 20
Live in Water
mammals
non-mammal
s總計
no 5 6 11
sometimes 4 4
yes 2 3 5
總計 7 13 20
Have Legsmammal
s
non-mammal
s總計
no 2 4 6
yes 5 9 14
總計 7 13 20
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammalspython no no no no non-mammalssalmon no no yes no non-mammalswhale yes no yes no mammalsfrog no no sometimes yes non-mammalskomodo no no no yes non-mammalsbat yes yes no yes mammalspigeon no yes no yes non-mammalscat yes no no yes mammalsleopard shark yes no yes no non-mammalsturtle no no sometimes yes non-mammalspenguin no no sometimes yes non-mammalsporcupine yes no no yes mammalseel no no yes no non-mammalssalamander no no sometimes yes non-mammalsgila monster no no no yes non-mammalsplatypus no no no yes mammalsowl no yes no yes non-mammalsdolphin yes no yes no mammalseagle no yes no yes non-mammals
Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ?
![Page 16: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/16.jpg)
16Example of Naïve Bayes Classifier
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammalspython no no no no non-mammalssalmon no no yes no non-mammalswhale yes no yes no mammalsfrog no no sometimes yes non-mammalskomodo no no no yes non-mammalsbat yes yes no yes mammalspigeon no yes no yes non-mammalscat yes no no yes mammalsleopard shark yes no yes no non-mammalsturtle no no sometimes yes non-mammalspenguin no no sometimes yes non-mammalsporcupine yes no no yes mammalseel no no yes no non-mammalssalamander no no sometimes yes non-mammalsgila monster no no no yes non-mammalsplatypus no no no yes mammalsowl no yes no yes non-mammalsdolphin yes no yes no mammalseagle no yes no yes non-mammals
Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ?
0027.02013
004.0)()|(
021.0207
06.0)()|(
0042.0134
133
1310
131
)|(
06.072
72
76
76
)|(
NPNAP
MPMAP
NAP
MAP
A: attributes
M: mammals
N: non-mammals
P(A|M)P(M) > P(A|N)P(N)
Mammals
![Page 17: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/17.jpg)
18How to Estimate Probabilities from Data?
Normal distribution:
– One for each (Ai,ci) pair
For (Income, Class=No):
– If Class=No μ= 110
σ2= 2975
2
2
2
)(
22
1)|( ij
ijiA
ij
ji ecAP
0072.0)2975(2
1)|120( )2975(2
)110120( 2
eNoIncomeP
![Page 18: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/18.jpg)
20Example
P(O)=4/10 P(Yes|O)=1/4 P(Single|O)=1/4 P(Mid|O)=1/4
Tid RefundMaritalStatus
TaxableIncome
Class
1 Yes Single High X
2 Yes Single High X
3 Yes Single High X
4 Yes Single High X
5 Yes Single High X
6 No Married High X
7 Yes Married Low O
8 No Single Low O
9 No Married Low O
10 No Married Mid O
P(X)=6/10 P(Yes|X)=5/6 P(Single|X)=5/6 P(Mid|X)=0
W={Yes,Single,Mid} P(O|W)= (4/10)(1/4)3/P(W)= (1/160)/P(W) P(X|W)= (6/10)(5/6)2*0/P(W)= 0/P(W) = 0 ∵P(O|W) > P(X|W) Class(W)=O∴
)(
)()|()|(
21
2121
n
nn AAAP
CPCAAAPAAACP
P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
![Page 19: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/19.jpg)
21Naïve Bayes Classifier
If one of the conditional probability is zero, then the entire expression becomes zero
Probability estimation:
mN
mpNCAP
cN
NCAP
N
NCAP
c
ici
c
ici
c
ici
)|(:estimate-m
1)|(:Laplace
)|( :Originalc: number of classes
p: prior probability
m: parameter
![Page 20: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/20.jpg)
22Example
P(O)=4/10 P(Yes|O)=1/4 P(Single|O)=1/4 P(Mid|O)=1/4
Tid RefundMaritalStatus
TaxableIncome
Class
1 Yes Single High X
2 Yes Single High X
3 Yes Single High X
4 Yes Single High X
5 Yes Single High X
6 No Married High X
7 Yes Married Low O
8 No Single Low O
9 No Married Low O
10 No Married Mid O
P(X)=6/10 P(Yes|X)=5/6 P(Single|X)=5/6 P(Mid|X)=0
m
mXMidP
mN
mpNCAP
XMidPcN
NCAP
XMidPN
NCAP
c
ici
c
ici
c
ici
9
1.00)|(,)|(:estimate-m
9
1
36
10)|(,
1)|(:Laplace
6
0)|(,)|( :Original
![Page 21: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/21.jpg)
23Laplace probability estimate
where k is the number of classes.
Problems with Laplace:Assumes all classes a priori equally likelyDegree of pruning depends on number of classes
kNnp C
C 1
![Page 22: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/22.jpg)
24m-estimate of probability
pC = ( nC + pCa m ) / ( N + m)
where:
pCa = a prior probability of class C
m is a non-negative parameter tuned
by expert
![Page 23: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/23.jpg)
25m-estimate
Important points: Takes into account prior probabilities Pruning not sensitive to number of classes Varying m: series of differently pruned
trees Choice of m depends on confidence in
data
![Page 24: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/24.jpg)
26m-estimate in pruning
Choice of m:
Low noise low m little pruning
High noise high m much pruning
Note: Using m-estimate is as if examples at
node were a random sample, which they are
not. Suitably adjusting m compensates for this.
![Page 25: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/25.jpg)
27貝氏信念網路 (Bayesian belief network , BBN)
![Page 26: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/26.jpg)
28貝氏信念網路 範例 2- 已知高血壓
1967.08033.01)|(
8033.05185./49.*85.)(
)()|()|(
5185.051.2.49..85)()|()(
HighBPNoHDP
HighBPP
YesHDPYesHDHighBPPHighBPYesHDP
HDPHDHighBPPHighBPP
![Page 27: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/27.jpg)
29貝氏信念網路 範例 3- 高血壓 ,健康飲食 ,規律運動
4138.05862.01),,|(
5862.075.*2.25.*85.
25.*85.
),|()|(
),|()|(
),|(),,(
),,|(
),,|(
YesEHealthyDHighBPNoHDP
YesEHealthyDHDPHDHighBPP
YesEHealthyDYesHDPYesHDHighBPP
YesEHealthyDYesHDPYesEHealthyDHighBPP
YesEHealthyDYesHDHighBPP
YesEHealthyDHighBPYesHDP
![Page 28: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/28.jpg)
30
Artificial Neural Networks (ANN)
![Page 29: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/29.jpg)
31Artificial Neural Networks (ANN)
X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0
X1
X2
X3
Y
Black box
Output
Input
Output Y is 1 if at least two of the three inputs are equal to 1.
![Page 30: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/30.jpg)
32Artificial Neural Networks (ANN)
X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0
X1
X2
X3
Y
Black box
0.3
0.3
0.3 t=0.4
Outputnode
Inputnodes
otherwise0
trueis if1)( where
)04.03.03.03.0( 321
zzI
XXXIY
![Page 31: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/31.jpg)
33Artificial Neural Networks (ANN)
Model is an assembly of inter-connected nodes and weighted links
Output node sums up each of its input value according to the weights of its links
Compare output node against some threshold t
X1
X2
X3
Y
Black box
w1
t
Outputnode
Inputnodes
w2
w3
)( tXwIYi
ii Perceptron Model
)( tXwsignYi
ii
or
![Page 32: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/32.jpg)
34General Structure of ANN
Activationfunction
g(Si )Si Oi
I1
I2
I3
wi1
wi2
wi3
Oi
Neuron iInput Output
threshold, t
InputLayer
HiddenLayer
OutputLayer
x1 x2 x3 x4 x5
y
Training ANN means learning the weights of the neurons
![Page 33: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/33.jpg)
35Algorithm for learning ANN
Initialize the weights (w0, w1, …, wk)
Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples
– Objective function:
– Find the weights wi’s that minimize the above objective function e.g., backpropagation algorithm (see lecture notes)
2),( i
iii XwfYE
![Page 34: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/34.jpg)
36Support Vector Machines
Find a linear hyperplane (decision boundary) that will separate the data
![Page 35: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/35.jpg)
37Support Vector Machines
One Possible Solution
B1
![Page 36: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/36.jpg)
38Support Vector Machines
Another possible solution
B2
![Page 37: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/37.jpg)
39Support Vector Machines
Other possible solutions
B2
![Page 38: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/38.jpg)
40Support Vector Machines
Which one is better? B1 or B2? How do you define better?
B1
B2
![Page 39: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/39.jpg)
41Support Vector Machines
Find hyperplane maximizes the margin => B1 is better than B2
B1
B2
b11
b12
b21
b22
margin
![Page 40: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/40.jpg)
42Support Vector Machines
B1
b11
b12
0 bxw
1 bxw 1 bxw
1bxw if1
1bxw if1)(
xf 2||||
2 Margin
w
![Page 41: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/41.jpg)
43Support Vector Machines
We want to maximize:
– Which is equivalent to minimizing:
– But subjected to the following constraints:
This is a constrained optimization problem– Numerical approaches to solve it (e.g., quadratic programming)
2||||
2 Margin
w
1bxw if1
1bxw if1)(
i
i
ixf
2
||||)(
2wwL
![Page 42: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/42.jpg)
44Support Vector Machines
What if the problem is not linearly separable?
![Page 43: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/43.jpg)
45Support Vector Machines
What if the problem is not linearly separable?
– Introduce slack variables Need to minimize:
Subject to:
ii
ii
1bxw if1
-1bxw if1)(
ixf
N
i
kiC
wwL
1
2
2
||||)(
![Page 44: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/44.jpg)
46Nonlinear Support Vector Machines
What if decision boundary is not linear?
![Page 45: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/45.jpg)
47Nonlinear Support Vector Machines
Transform data into higher dimensional space
![Page 46: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/46.jpg)
48Ensemble Methods
Construct a set of classifiers from the training data
Predict class label of previously unseen records by aggregating predictions made by multiple classifiers
![Page 47: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/47.jpg)
49General Idea
OriginalTraining data
....D1D2 Dt-1 Dt
D
Step 1:Create Multiple
Data Sets
C1 C2 Ct -1 Ct
Step 2:Build Multiple
Classifiers
C*Step 3:
CombineClassifiers
![Page 48: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/48.jpg)
50Why does it work?
Suppose there are 25 base classifiers
– Each classifier has error rate, = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes a wrong prediction:
25
13
25 06.0)1(25
i
ii
i
![Page 49: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/49.jpg)
51Examples of Ensemble Methods
How to generate an ensemble of classifiers?
– Bagging
– Boosting
![Page 50: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/50.jpg)
52Bagging
Sampling with replacement
Build classifier on each bootstrap sample
Each sample has probability (1 – 1/n)n of being selected
Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
![Page 51: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/51.jpg)
53Boosting
An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records
– Initially, all N records are assigned equal weights
– Unlike bagging, weights may change at the end of boosting round
![Page 52: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/52.jpg)
54Boosting
Records that are wrongly classified will have their weights increased
Records that are classified correctly will have their weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds
![Page 53: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/53.jpg)
55Example: AdaBoost
Base classifiers: C1, C2, …, CT
Error rate:
Importance of a classifier:
N
jjjiji yxCw
N 1
)(1
i
ii
1ln
2
1
![Page 54: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/54.jpg)
56Example: AdaBoost
Weight update:
If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated
Classification:
factor ionnormalizat theis where
)( ifexp
)( ifexp)()1(
j
iij
iij
j
jij
i
Z
yxC
yxC
Z
ww
j
j
T
jjj
yyxCxC
1
)(maxarg)(*
![Page 55: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/55.jpg)
57
BoostingRound 1 + + + -- - - - - -
0.0094 0.0094 0.4623B1
= 1.9459
Illustrating AdaBoost
Data points for training
Initial weights for each data point
OriginalData + + + -- - - - + +
0.1 0.1 0.1
![Page 56: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/56.jpg)
58Illustrating AdaBoost
BoostingRound 1 + + + -- - - - - -
BoostingRound 2 - - - -- - - - + +
BoostingRound 3 + + + ++ + + + + +
Overall + + + -- - - - + +
0.0094 0.0094 0.4623
0.3037 0.0009 0.0422
0.0276 0.1819 0.0038
B1
B2
B3
= 1.9459
= 2.9323
= 3.8744
![Page 57: Data Mining Classification: Alternative Techniques 第 5 章 分類技術](https://reader036.vdocuments.pub/reader036/viewer/2022081419/56649f0c5503460f94c1fcb7/html5/thumbnails/57.jpg)
59
Rule-Based Classifier