an efficient algorithm for discovering frequent subgraphs michihiro kuramochi and george karypis...

22
An Efficient Algorithm fo r Discovering Frequent Su bgraphs Michihiro Kuramochi and George Karypi s ICDM, 2001 報報報 報報報

Upload: loreen-cross

Post on 02-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

An Efficient Algorithm for Discovering Frequent Subgraphs

Michihiro Kuramochi and George KarypisICDM, 2001報告者:蔡明瑾

Page 2: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Introduction

Structural pattern Biology, chemistry Chemical compounds

graph vertex– item edge – relation between items

Undirected connected labeled graph

b

a

x

a

y

x

Page 3: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Graph Isomorphism

b

a

x

a

x

y

a

b

x

a

y

x

G1(V1,E1) and G2(V2,E2) are topologically identical to each other.

There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa.

v0

v1 v2

v0

v1 v2

=

Page 4: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Canonical labeling

Adjacency listb

a

x

a

x

y

v0

v1 v2

v0

v1

v2

v0

b

v1

a

v2

a

x x

x y

x ycode = baaxxy

a

b

x

a

y

x

v0

v1 v2

v0

v1

v2

v0

a

v1

b

v2

a

x y

x x

y xcode = abaxyx

||

Page 5: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Canonical labeling

Different permutation of vertices lead to different canonical label.

|v|! Largest codes

Page 6: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Vertex invariants

Properties don’t change across isomorphism mappings. Vertex degree Vertex label siblings

b

a

x

a

x

y

Page 7: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Vertex Degrees and Labels

Adjacency Matrix Partitioning verteices by degrees and labels

that every partition contains vertices with same degree and label

Page 8: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Degree : p0={v0,v1,v3}:2

Degree+label :p0={ v1,v2}:(2,a),p1={v0}:(2,b)

Vertex Degrees and Labels

b

a

x

a

x

y

v0

v1 v2

v0

v1

v2

v0

b

v1

a

v2

a

x x

x y

x ycode = baaxxy

Page 9: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Vertex Degrees and Labels

b

a

x

a

x

y

v0

v1 v2

v1

v2

v0

v1

a

v2

a

v0

b

y x

y x

x xcode = aabyxx

p0={ v1,v2}:2,a,p1={v0}:2,b

原本: 3!

現在: 2!x 1!

Page 10: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Running example minsup =20

1

0 2

1 2 1

0

0

0

3

1 3

0

1

0 2

1

0

0 3

3

0

1

0

2

4

0

0

1

1

0

10

0

21

1

23

2

41

g0 g1 g2

Tid_list {0,1,2}

{0,2} {0,1} {2}

cl 010 021 123

Frequent 1_subgraph

Page 11: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Running example minsup =2

tid {0,1,2}

cl 010

child

{0,2}

021

{0,1}

123

0

10

0

21

1

23

0

1 2

0 10

1 1

0 00

101

23

Possible tid

{0,1,2}

c0 c2 c3

{0,2}

{0,1}

0

101

00

c1

{0,1,2}

c0,c1,c2,c3

c2 c3

……

Page 12: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

0

1 2

0 10

10

23

c2 c30

101

00

c1

tid {0,2} {0,1,2} {0,1} {0,1}

cl 01201x 10000x 10203x 21133x

1

23

13

c4

tid {0,1,2}

cl 010

child

c1,c2,c3

{0,2}

021

{0,1}

123

0

10

0

21

1

23

c2 c3,c4

Frequent 2_subgraph

Page 13: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Frequency computing

Id-list Intersection two k-subgraph’s id-list

Frequent->find the support Not frequent -> pruned

Page 14: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Candidate generation

Joining two frequent k-subgraph ->k+1 candidate subgraph

Having same k-1 core Vertex labeling Multiple cores Multiple automorphisms

Page 15: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Vertex labeling

Page 16: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Multiple automorphism

Page 17: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Multiple cores

Page 18: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

0

1 2

0 10

10

23

c2 c30

101

00

c11

23

13

c4

tid {0,1,2}

cl 010

child

c1,c2,c3

{0,2}

021

{0,1}

123

0

10

0

21

1

23

c2 c3,c4

0

1 2

0 1q1

tid {0,2}

cl 01201x

child

{0,1,2}

10000x

{0,1}

10203x

{0,1}

21133x

11

00

0

1 2

0 1

2

1

Possible tid

{0, 2}{0, 2}

q0,q1

q0 0

2

0 1 q2

1

00

{0,}

q1

0

2 1

1 0

1

0

{0, 2}

不符合 downward closure

不符合 downward closure

Page 19: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Experiment

AMD 1.53GHz 2GB main memory Linux OS chemical compound:

PTE(340),66 atom types and four bond types,27 edges/graph on average

DTP(223,644),104 atom types and three bound types and 22 edges/graph on average

Synthetic datasets

Page 20: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

PTE and DTP

Page 21: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Synthetic datasets

Page 22: An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Synthetic datasets |D|=10000,|S|=200,|LE|=1,minsup=2%