sequential pattern mining using a bitmap representation jay ayres, johannes gehrke, tomi yiu, and...

35
Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University (SIGKDD 2002) Presenter 李李李 P76034525 李李李 P76034672 李李李 P78031125 李李李 Q56034035 2014/11/20 1

Upload: brent-packard

Post on 01-Apr-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

1

Sequential PAttern Mining Using A Bitmap Representation

Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason FlannickDept. of Computer Science Cornell University

(SIGKDD 2002)

Presenter李佩書 P76034525 楊璨瑜 P76034672 陳奕廷 P78031125 李昕純 Q56034035

2014/11/20

Page 2: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

2

Outline

1. Introduction

2. The SPAM algorithm

3. Data representation

4. Experimental

5. Conclusion & Discussion

2014/11/20

Page 3: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

3

Introduction

2014/11/20

Page 4: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

4

Sequential Patterns

• R. Agrawal and R. Srikant.(In ICDE 1995)• Algorithm : AprioriALL, AprioriSOME, PrefixSpan…

2014/11/20

Page 5: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

5

Problem

• Mining sequential patterns• Given a minimum support minSup• Find all frequent sequential patterns Sa

• supD(Sa) minSup

2014/11/20

Page 6: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

6

SPAM Algorithm

• Sequential PAttern Mining Algorithm• The first DFS(depth-first search) strategy for mining sequential patterns

• Vertical bitmap representation for simple, efficient counting.

2014/11/20

Page 7: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

7

The SPAM Algorithm

2014/11/20

Page 8: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

8

Lexicographic Tree• Sequence-extended Sequence (S-step)

• Generate by adding a new transaction consisting of a single item to the end of sequence

• Ex: ({a, b, c}, {a, b})→({a, b, c}, {a, b}, {a})

• Itemset-extended sequence (I-step)• Generate by adding an item to the last itemset in the sequence• Ex 1: ({a, b, c}, {a, b}) →({a, b, c}, {a, b, d})• Ex 2: ({a, b, c}, {a, b, d}) →({a, b, c}, {a, b, d, c})

• Identifies two sets of each node n• Sn: the set of candidate items for S-step extensions• In: the set of candidate items for I-step extensions

2014/11/20

Page 9: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

9

I={a,b}

2014/11/20

Page 10: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

10

Pruning

• Apriori-Based• Minimizing the size of Sn and In• Pruning candidate by DFS.

S-step Pruning I-step Pruning

2014/11/20

Page 11: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

11

S-step Pruning

S({a}) = {a, b, c, d}I({a}) = {b, c, d}S({a}, {a}) = S({a}, {b}) = {a, b, c, d}I({a}, {a}) = {b, c, d}I({a}, {b}) = {c, d}

2014/11/20

Page 12: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

12

I-step Pruning

S({a, b}) = S({a, d}) = {a, b}I({a}, {b}) = {c, d}I({a}, {d}) = {}

2014/11/20

Page 13: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

132014/11/20

Page 14: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

14

Data Representation

2014/11/20

Page 15: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

15

If the size of a sequence between 2k+1 and 2k+1 2k+1-bit sequence

• We store each candidate sequence as a vertical bitmap• Each customer is assigned a fixed slice of each bitmap

for all of its transactions

2014/11/20

Page 16: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

16

Bitmap of itemset{a}

1000

0100

1000

{b}

1110

1100

1100

{a,b}

1000

0100

1000

&

2014/11/20

Page 17: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

17

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

2014/11/20

Page 18: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

18

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

2014/11/20

Page 19: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

19

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

2014/11/20

Page 20: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

20

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

2014/11/20

Page 21: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

21

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

2014/11/20

Page 22: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

22

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

1

2014/11/20

Page 23: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

23

Bitmap of sequence• Define B(s) as the bitmap for sequence s.

• In sequence s• If the last itemset is in transaction j

and the other itemsets is in transaction before j • Then set 1 , otherwise set 0

• Example1:

Customer ID

Transaction ID

Itemset

1 1 {b}

1 2 {d}

1 3 {e}

1 4 {c}

({b},{c})

0

0

0

1

2014/11/20

Page 24: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

24

• Example2

Customer ID

Transaction ID

Itemset

1 1 {a,b,d}

1 3 {b,c,d}

1 6 {b,c,d}

-- -- --

({a},{b,d})

0

1

1

0

2014/11/20

Page 25: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

25

S-step ProcessStep 1 : S-Step Process to construct the transformed bitmap ({a})s

Step 2 : ANDing B({a})s and B({b})s

Support=2

2014/11/20

Page 26: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

26

S-step ProcessStep 1:S-Step Process to construct the transformed bitmap ({a})s

Step 2:ANDing B({a}) s and B({b})s

2014/11/20

Page 27: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

27

I-step Process

Support=2

2014/11/20

Page 28: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

28

I-step Process

2014/11/20

Page 29: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

29

Experimental

2014/11/20

Page 30: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

30

Comparison With SPADE and PrefixSpan

Method-1• Compare for various minimum support values on

Small datasetsMedium datasetsLarge datasets

• Methods-2Compare several parameters in the dataset Number of customersNumber of transactions per customerNumber of items per transactionAverage length of the maximal sequences

2014/11/20

Page 31: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

312014/11/20

Page 32: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

32

Conclusion & Discussion

2014/11/20

Page 33: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

33

CONCLUSION

• ALGORITHM• Outperforms SPADE and PrefixSpan on large datasets• Faster then SPADE and PrefixSpan

• DATA REPRESENTATION• Bitmap representation• S-step/I-step traversal• S-step/I-step pruning • Especially efficient when the sequential patterns are

very long

2014/11/20

Page 34: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

34

Implement SPAM algorithm

SPMF is an mining mining frameworkWritten in Java/Open-source data http://www.philippe-fournier-viger.com/spmf/index.php

2014/11/20

Page 35: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University

35

DISCUSSION

1. SPAM assumes that the entire database completely fit into main memory, what is the solution ?

2. Why they set the size of a sequence between 2k+1 and 2k+1 ?

2014/11/20