c lustering networked data based on link and similarity in a ctive learning advisor : sing ling lee...
TRANSCRIPT
1
CLUSTERING NETWORKED DATA BASED ON LINK AND
SIMILARITY IN ACTIVE LEARNING
Advisor : Sing Ling Lee
Student : Yi Ming Chang
Speaker : Yi Ming Chang
2
OUTLINE
Introduction
Active Learning
Networked data
Related Work
Newman’s Modularity
Collective Classification(ICA)
ALFNET
CLAL
Experimental Results
Conclusion
3
PASSIVE LEARNING
-
+
++-
-
-
+ ClassifierTrain
Classify-
+
++-
-
-
++
++
+
+
+--
-
-
: Unlabeled instance
: Labeled instance
+Testing data
Training data
Wrong : 5+
-
+
-
4
ACTIVE LEARNING
+-
-
ClassifierTrain
Classify-
+
++-
-
-
++
++
+
+
+--
-
-
: Unlabeled node
: Labeled node
Testing data
Training data
+
+
-
+
+
-
Query
EX : Query batch number = 3
+
+-
-
Wrong : 2
6
OUTLINE
Introduction
Active Learning
Networked data
Related Work
Newman’s Modularity
Collective Classification(ICA)
ALFNET
CLAL
Experimental Results
Conclusion
7
NEWMAN’S MODULARITY FOR CLUSTERING
m = 5 : Real edge : Degree of node : Group of node
= (1 – 2*2 /10 ) = (0 – 2*2/10 ) = (1 – 2*3/10 ) = (0 – 2*1/10 )
1
32
5
4
ijAik
iisi
21ss12B13B 31ss14B 41ss15B 51ss
121 ss131 ss141 ss151 ss
1,1,
1
23
4
5
8
NEWMAN’S MODULARITY FOR CLUSTERING
Example :
= (1 – 5*2 /16 ) = 0.375 = (0 – 5*3/ 16 ) = -0.9375 = (1 – 2*5/ 16 ) = 0.375 = (1 – 2*3/ 16 ) = 0.625 = (0 – 3*5/ 16 ) = -0.9375 = (1 – 3*2/ 16) = 0.625
1 32
21ss12B13B 31ss21B 12ss23B 32ss31B32B
13ss23ss
21ss31ss12ss32ss13ss23ss
0.625+0.625 > 0.375+0.375
10
COLLECTIVE CLASSIFICATION(ICA)
Iterative Classification Algorithm(ICA)
-
-
+
?
?
?
+
Content-Onlylearner
?
?
?
?
training
Collectivelearner
Compute neighbor feature using CO
training
Until stable orthreshold of iteration have elapsed
Iteration 1
Iteration 2
Iteration 3
Compute neighbor feature using CC
.
.
.
1 0 0 1 0 … 1 3/5 2/5 ..1 0 0 1 0 … 1
feature Neighbor featureCOCC
11
CC PROBLEMHow to set threshold?
-
-
+: Labeled node
: Unlabeled node
-
+1
2
Infer neighbor feature :
-
1
2
3
Iteration 1:+ -
2/5 3/5
3/5+
2/5
3 0/1 1/1
Iteration 2: 1
2
3
3/5 2/5
2/5 3/5
1/1 0/1
-
+
Iteration 3: 1
2
3
2/5 3/5
4/5 1/5
0/1 1/1
+
-+
Iteration 4: 1
2
3
3/5 2/5
2/5 3/5
1/1 0/1
-
+-
+
-+
+ -
Iteration 5: 1
2
3
2/5 3/5
4/5 1/5
0/1 1/1
-
+-
12
ALFNET
1. Cluster data at least k clusters.
2. Pick k clusters based on size and initialize Content-Only(CO) classifier
cluster cluster cluster
… ……
k
COClassifier
SVM
13
ALFNET
3.while (labeled nodes < budget )3.1 Re-train CO and CC classifier
3.2 pick k cluster based on score :
CO
CC
cluster cluster cluster
… ……
k
Trainingset
train
15
ALFNET
CO CCMain Label
Class A
Class B
Class C
Class D
entropy(1/3) + entropy(1/3) + entropy(1/3) = 0.3662 *3
predicted category
proportion of three classifier predicted
predict
entropy(2/3) + entropy(1/3) = 0.2703 + 0.3662
entropy(3/3) = 0
CO
CC
Main
16
OUTLINE
Introduction
Active Learning
Networked data
Related Work
Newman’s Modularity
Collective Classification(ICA)
ALFNET
CLAL
Experimental Results
Conclusion
17
MODULARITY AND SIMILARITY
Node 1Node 2
1 1 0 01 0 0 0
Node 3Node 4
1 1 0 0 0 0 1 1
4
1
44
11
4
0
44
00
44
1
16
1
441
16
1
441
EX:
19
CLAL
: Labeled node
: Unlabeled node
trainingCO trainingCO
Query &classify
Query &classify
Until Labeled node > budget
20
TUNING AND GREEDY MECHANISM
??
?
?
?
??
??
??
?
??
?
: Labeled node
: Unlabeled node
CO
Query &classify
trainingCO
Query &classify
Retrain &
MoveOut-link > In-link
reserve the greater COs
Moving priority:OutLink - Inlink3 -> 2 -> 1 -> 1
Clustering priority :Low accuracy -> High accuracy
MoveOut-link > In-link
CO CO
21
OUTLINE
Introduction
Active Learning
Networked data
Related Work
Newman’s Modularity
Collective Classification(ICA)
ALFNET
CLAL
Experimental Results
Conclusion
22
BACKGROUNDNetworked data
Social network
Citation network
word
Paper NO.
word…
word
nodecite
word
Paper NO.
word…
word
feature
Person name
feature…
feature
feature
Person name
feature…
feature
node
Attribute
Attribute
friend
23
OUTLINE
Introduction
Active Learning
Networked data
Related Work
Newman’s Modularity
Collective Classification(ICA)
ALFNET
CLAL
Experimental Results
Conclusion
25
SVMTraining data sets :
+
+
++
+
+
Margin
-
-
-
-
-
+
+
++
+
+
Margin
-
-
-
-
-
Hyper-plan
1,1,...,2,1,, , id
ini yRxiyxi ,
26
CHALLENGE
Query efficiency from discriminative feature
Paper name word
word …
…
word
510Sum of 2 class
word …word
Class 1
Class 2
400 250
word
word
word
word
250
260 180
220 100
150
Paper name
Paper name
27
CC PROBLEM :HOW TO SET TERMINAL CONDITION? Different iteration will obtain diverse result.
: CO predicted label : true labeled : labeled
Infer neighbor feature
Neighbor feature
NF_A NF_B
3/5 2/5
BB
BB
AA
A
AA
Local feature
0,1,0,…
F1,F2,…
A
BB
2/3 1/3
A
1/3 2/3
B
AA
Iteration 1Iteration 2
4/5 1/52/3 1/32/3 1/3
A
A
A
CC classifier
28
ALFNETQuery and training CO
Query and training classifier
Compute
Compute
Iteration > ?
Ni
NiN
Y
Labeled node
>Budget?
Y
N
Output