SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu (徐晓伟 )
Joint Work with Nurcan Yuruk (UALR) and Thomas A. J. Schweiger (Acxiom)
Network Clustering Problem
Networks made up of the mutual relationships of data elements usually have an underlying structure. Because relationships are complex, it is difficult to discover these structures. How can the structure be made clear?
Stated another way, given simply information of who associates with whom, could one identify clusters of individuals with common interests or special relationships (families, cliques, terrorist cells).
An Example of Networks
How many clusters?
What size should they be?
What is the best partitioning?
Should some points be segregated?
A Social Network Model
Individuals in a tight social group, or clique, know many of the same people, regardless of the size of the group.
Individuals who are hubs know many people in different groups but belong to no single group. Politicians, for example bridge multiple groups.
Individuals who are outliers reside at the margins of society. Hermits, for example, know few people and belong to no group.
The Neighborhood of a Vertex
v
Define () as the immediate neighborhood of a vertex (i.e. the set of people that an individual knows ).
Structure Similarity
The desired features tend to be captured by a measure we call Structural Similarity
Structural similarity is large for members of a clique and small for hubs and outliers.
|)(||)(|
|)()(|),(
wv
wvwv
Structural Connectivity [1]
-Neighborhood: Core: Direct structure reachable:
Structure reachable: transitive closure of direct structure reachability
Structure connected:
}),(|)({)( wvvwvN
|)(|)(, vNvCORE
)()(),( ,, vNwvCOREwvDirRECH
),(),(:),( ,,, wuRECHvuRECHVuwvCONNECT
[1] M. Ester, H. P. Kriegel, J. Sander, & X. Xu (KDD'97)
Structure-Connected Clusters Structure-connected cluster C
Connectivity: Maximality:
Hubs: Not belong to any cluster Bridge to many clusters
Outliers: Not belong to any cluster Connect to less clusters
),(:, , wvCONNECTCwv
CwwvREACHCvVwv ),(:, ,
hub
outlier
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.63
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.75
0.67
0.82
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.67
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.73
0.730.73
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.51
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.68
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
0.51
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7 0.51
0.51
0.68
13
9
10
11
7
812
6
4
0
15
2
3
Algorithm
= 2 = 0.7
Running Time
Running time = O(|E|) For sparse networks = O(|V|)
[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, 066111 (2004).
Are you ready for some football? Given only the 2006 schedule of
what schools each NCAA Division 1A team met on a football field, what underlying structures could one discover?
789 Contests
119 Division 1A school who play:schools in their conferenceschools in other 1A conferences independent 1A schools (e.g. Army)schools in sub-1A conferences (e.g.
Maine)
Consider Arkansas’ Schedule:
USC Pacific 10Utah State Western
AthleticVanderbilt SECAlabama SECAuburn SECSoutheast Missouri State Non 1AMississippi SECLouisiana Monroe Sun BeltSouthCarolina SECTennessee SECMississippi State SECLSU SECFlorida SECWisconsin Big 10
The Network:
The 1A Conference:
Result of Our Algorithm:
Result of FastModularity Alg. [2]:
[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, 066111 (2004).
Conclusion
We propose a novel network clustering algorithm: It is fast O(|E|), for scale free networks: O(|V|) It can find clusters, as well as hubs and outliers
For more information:See you in poster session this evening at
poster board #4Email: [email protected]: http://ifsc.ualr.edu/xwxu
Thank you!