finding and evaluating community structure in networks

23
Finding and Evaluating Community Structure in Networks M.E.J. Newman and M. Girvan Physical Review E 69, 026113 (2004) 11 July 2014 SNU IDB Lab. Namyoon Kim

Upload: ava-herman

Post on 31-Dec-2015

36 views

Category:

Documents


2 download

DESCRIPTION

Finding and Evaluating Community Structure in Networks. M.E.J. Newman and M. Girvan Physical Review E 69, 026113 (2004) 1 1 July 2014 SNU IDB Lab. Namyoon Kim. Outline. Introduction Hierarchical Clustering Edge Betweenness The Algorithm Implementation Weighting - PowerPoint PPT Presentation

TRANSCRIPT

Finding and Evaluating Community Structure in NetworksM.E.J. Newman and M. GirvanPhysical Review E 69, 026113 (2004)

11 July 2014SNU IDB Lab.

Namyoon Kim

2 / 23

Outline

IntroductionHierarchical ClusteringEdge BetweennessThe AlgorithmImplementation

WeightingEdge betweenness contribution

Community strengthModularityTestsConclusion

3 / 23

Introduction

NetworksInterest in theoretical modelling of networks in recent yearsCovers a wide variety of topics such as statistical physics, applied mathematics, computational biology and social networking

Community StructureWithin network connections: denseBetween network connections: sparse

4 / 23

Hierarchical Clustering: Agglomerative

AgglomerativeEdges added to an initially empty networkTends to find only the core of communitiesPeripheral nodes are important in finding the true size of a network

5 / 23

Hierarchical Clustering: Divisive

DivisiveStart with a non-empty network, find the least similar pairs of vertices and re-move their in-between edges

Newman’s approachLook for edges that are between networks

6 / 23

Edge Betweenness

BetweennessAll paths from community A to community B (and vice versa) must pass through either edges 1 or 2Edges 1 and 2 have high betweenness

source: www.cs.kent.edu/~jin/DM07/PPT/muad.ppt

1

2

7 / 23

The Algorithm

Shortest path betweennessFind shortest paths for all pairs of vertices and count how many run along each edge

Recalculation stepRemove edge with highest countRecalcuate shortest path betweenness for all edges

Steps1. Calculate betweenness scores for all edges in the network2. Find the edge with the highest score and remove it from the network3. Recalculate betweenness for all remaining edges4. Repeat from step 2

8 / 23

Implementation – weighting (i)

Weightingi. Initial vertex s is given distance 0 and weight 1

S

(ds = 0, ws = 1)

9 / 23

Weighting (ii)

Weightingii. Every vertex i adjacent to s is given distance di = ds + 1 = 1

and weight wi = ws = 1

S

(0, 1)

(di = 1, wi = 1)(di = 1, wi = 1)

ii

10 / 23

Weighting (iii)

Weightingiii. For each vertex j adjacent to i, do:

a) wj = wi, and dj = di + 1,

ONLY when dj is not as-signed

yet b) Add weights of other incoming vertices (i) ONLY

ifdj is assigned AND dj ≥ di +

1

S

(0, 1)

(1, 1)

(1, 1)

ii

(di = 2, wi = 2) (di = 2, wi = 1)

j j

11 / 23

Weighting (iv)

Weightingiv. Repeat from iii until no vertices remain that have assigned distances but whose neighbours do not have assigned distances

Time complexity: O(E)

S

(0, 1)

(1, 1)

(1, 1)

(2, 2)

(2, 1)

(3, 1)

(3, 3)

12 / 23

Implementation – edge betweenness contribution (i)

Edge betweenness contributioni. Find every “leaf” vertex t that no paths from s to other vertices go through

S

(1)

(1)(1)

(2) (1)

(1)(3) t t

13 / 23

Edge betweenness contribution (ii)

Edge betweenness contributionii. From each vertex i neighbouring t, assign a score for the t-i edge of wi/wt

S

(1)

(1)(1)

(2) (1)

(1)(3) t t

2313

1

i i

14 / 23

Edge betweenness contribution (iii)

Edge betweenness contributioniii. Work upwards to s. From node j to i (j farther from s than i), assign the edge a score of wi/wj×(1 + sum of all scores of edges immediately below j)

S

(1)

(1)(1)

(2) (1)

(1)(3)

2313

1

j j

ii11×(1+1+ 13 )=731

2×(1+ 23 )=56

56

15 / 23

Edge betweenness contribution (iv)

Edge betweenness contributioniv. Repeat from iii until s is reached

Time complexity: O(E)

S

(1)

(1)(1)

(2) (1)

(1)(3)

2313

1

11×(1+ 56 + 7

3 )=25611×(1+ 56 )=116

5673

56

16 / 23

Algorithm complexity

Edge betweenness contributionRepeat weighting and edge betweenness contribution calculations for all V source vertices s, E times (every time an edge is removed)

Time complexity:(O(E) + O(E)) × V × E = O(E2V)= O(n3)

S

(1)

(1)(1)

(2) (1)

(1)(3)

2313

1

5673

56

116

256

17 / 23

Community strength

Community structure strengthHow do we know the algorithm produces good results?

Some definitionsSay we have a network which is currently divided into k communitiesWe have a k × k symmetric matrix eeach element eij = (edges that link vertices in community i to community j) / (all edges in the original* network)*Network’s initial state with no removed edges

Tr e = : fraction of edges in the network that connect vertices in the same community

ai = : fraction of edges that connect to vertices in community i

18 / 23

Modularity

ModularityQ =

Q = 0 means the split is no better than random partitioningQ = 1 means network has strong community structureGenerally, networks with reasonably well split communities have Q of 0.3 – 0.7

19 / 23

Tests – shortest-pathszin = mean no. of edges from a vertex to another vertex in same community

zout = mean no. of edges from a vertex to another vertex in different community

20 / 23

Tests - correctness

21 / 23

Tests – random walk and recalculation

22 / 23

ConclusionContributions

A new class of algorithms for performing network clustering

Described the task of extracting the natural community structure from net-works of vertices and edges

Future WorkReduce time complexity

23 / 23

References[1] M.E.J. Newman and M. Girvan. Finding and Evaluating Community Structure in Networks. Phys. Rev. E 69 (2):026113, 2004.[2] M.E.J. Newman, Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 69, 066133, 2004. Pre-sentation by Muad Abu-Ata, www.cs.kent.edu/~jin/DM07/PPT/muad.ppt