finding and evaluating community structure in networks
DESCRIPTION
Finding and Evaluating Community Structure in Networks. M.E.J. Newman and M. Girvan Physical Review E 69, 026113 (2004) 1 1 July 2014 SNU IDB Lab. Namyoon Kim. Outline. Introduction Hierarchical Clustering Edge Betweenness The Algorithm Implementation Weighting - PowerPoint PPT PresentationTRANSCRIPT
Finding and Evaluating Community Structure in NetworksM.E.J. Newman and M. GirvanPhysical Review E 69, 026113 (2004)
11 July 2014SNU IDB Lab.
Namyoon Kim
2 / 23
Outline
IntroductionHierarchical ClusteringEdge BetweennessThe AlgorithmImplementation
WeightingEdge betweenness contribution
Community strengthModularityTestsConclusion
3 / 23
Introduction
NetworksInterest in theoretical modelling of networks in recent yearsCovers a wide variety of topics such as statistical physics, applied mathematics, computational biology and social networking
Community StructureWithin network connections: denseBetween network connections: sparse
4 / 23
Hierarchical Clustering: Agglomerative
AgglomerativeEdges added to an initially empty networkTends to find only the core of communitiesPeripheral nodes are important in finding the true size of a network
5 / 23
Hierarchical Clustering: Divisive
DivisiveStart with a non-empty network, find the least similar pairs of vertices and re-move their in-between edges
Newman’s approachLook for edges that are between networks
6 / 23
Edge Betweenness
BetweennessAll paths from community A to community B (and vice versa) must pass through either edges 1 or 2Edges 1 and 2 have high betweenness
source: www.cs.kent.edu/~jin/DM07/PPT/muad.ppt
1
2
7 / 23
The Algorithm
Shortest path betweennessFind shortest paths for all pairs of vertices and count how many run along each edge
Recalculation stepRemove edge with highest countRecalcuate shortest path betweenness for all edges
Steps1. Calculate betweenness scores for all edges in the network2. Find the edge with the highest score and remove it from the network3. Recalculate betweenness for all remaining edges4. Repeat from step 2
8 / 23
Implementation – weighting (i)
Weightingi. Initial vertex s is given distance 0 and weight 1
S
(ds = 0, ws = 1)
9 / 23
Weighting (ii)
Weightingii. Every vertex i adjacent to s is given distance di = ds + 1 = 1
and weight wi = ws = 1
S
(0, 1)
(di = 1, wi = 1)(di = 1, wi = 1)
ii
10 / 23
Weighting (iii)
Weightingiii. For each vertex j adjacent to i, do:
a) wj = wi, and dj = di + 1,
ONLY when dj is not as-signed
yet b) Add weights of other incoming vertices (i) ONLY
ifdj is assigned AND dj ≥ di +
1
S
(0, 1)
(1, 1)
(1, 1)
ii
(di = 2, wi = 2) (di = 2, wi = 1)
j j
11 / 23
Weighting (iv)
Weightingiv. Repeat from iii until no vertices remain that have assigned distances but whose neighbours do not have assigned distances
Time complexity: O(E)
S
(0, 1)
(1, 1)
(1, 1)
(2, 2)
(2, 1)
(3, 1)
(3, 3)
12 / 23
Implementation – edge betweenness contribution (i)
Edge betweenness contributioni. Find every “leaf” vertex t that no paths from s to other vertices go through
S
(1)
(1)(1)
(2) (1)
(1)(3) t t
13 / 23
Edge betweenness contribution (ii)
Edge betweenness contributionii. From each vertex i neighbouring t, assign a score for the t-i edge of wi/wt
S
(1)
(1)(1)
(2) (1)
(1)(3) t t
2313
1
i i
14 / 23
Edge betweenness contribution (iii)
Edge betweenness contributioniii. Work upwards to s. From node j to i (j farther from s than i), assign the edge a score of wi/wj×(1 + sum of all scores of edges immediately below j)
S
(1)
(1)(1)
(2) (1)
(1)(3)
2313
1
j j
ii11×(1+1+ 13 )=731
2×(1+ 23 )=56
56
15 / 23
Edge betweenness contribution (iv)
Edge betweenness contributioniv. Repeat from iii until s is reached
Time complexity: O(E)
S
(1)
(1)(1)
(2) (1)
(1)(3)
2313
1
11×(1+ 56 + 7
3 )=25611×(1+ 56 )=116
5673
56
16 / 23
Algorithm complexity
Edge betweenness contributionRepeat weighting and edge betweenness contribution calculations for all V source vertices s, E times (every time an edge is removed)
Time complexity:(O(E) + O(E)) × V × E = O(E2V)= O(n3)
S
(1)
(1)(1)
(2) (1)
(1)(3)
2313
1
5673
56
116
256
17 / 23
Community strength
Community structure strengthHow do we know the algorithm produces good results?
Some definitionsSay we have a network which is currently divided into k communitiesWe have a k × k symmetric matrix eeach element eij = (edges that link vertices in community i to community j) / (all edges in the original* network)*Network’s initial state with no removed edges
Tr e = : fraction of edges in the network that connect vertices in the same community
ai = : fraction of edges that connect to vertices in community i
18 / 23
Modularity
ModularityQ =
Q = 0 means the split is no better than random partitioningQ = 1 means network has strong community structureGenerally, networks with reasonably well split communities have Q of 0.3 – 0.7
19 / 23
Tests – shortest-pathszin = mean no. of edges from a vertex to another vertex in same community
zout = mean no. of edges from a vertex to another vertex in different community
22 / 23
ConclusionContributions
A new class of algorithms for performing network clustering
Described the task of extracting the natural community structure from net-works of vertices and edges
Future WorkReduce time complexity
23 / 23
References[1] M.E.J. Newman and M. Girvan. Finding and Evaluating Community Structure in Networks. Phys. Rev. E 69 (2):026113, 2004.[2] M.E.J. Newman, Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 69, 066133, 2004. Pre-sentation by Muad Abu-Ata, www.cs.kent.edu/~jin/DM07/PPT/muad.ppt