temporal database paper reading r95922007 資工碩一 馬智釗 efficient mining strategy for...

26
Temporal Database Temporal Database Paper Reading Paper Reading R95922007 R95922007 資資資 資資資 資資資 資資資 Efficient Mining Strategy for Frequent Serial E Efficient Mining Strategy for Frequent Serial E pisodes in Temporal Database pisodes in Temporal Database , , K Huang, C K Huang, C Chang Chang

Upload: barry-neal

Post on 17-Jan-2016

244 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Temporal DatabaseTemporal DatabasePaper ReadingPaper Reading

R95922007 R95922007 資工碩一 馬智釗資工碩一 馬智釗

Efficient Mining Strategy for Frequent Serial EpisodEfficient Mining Strategy for Frequent Serial Episodes in Temporal Databasees in Temporal Database, , K Huang, C ChangK Huang, C Chang

Page 2: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

IntroductionIntroduction

Discover Discover frequent serial episodesfrequent serial episodes to find to find relationships between events.relationships between events.- explain the problems that cause a particular - explain the problems that cause a particular eventevent

- predict future result- predict future result

EpisodeEpisode : a partially ordered collection : a partially ordered collection of events occurring together.of events occurring together.- the user defines “how close is close enough”- the user defines “how close is close enough”

- - winwin : the width of the time window : the width of the time window

Page 3: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Three classes of Three classes of episodes episodes Introduced by Mannila et al.Introduced by Mannila et al. Serial episodesSerial episodes

- patterns of a total order in the sequence- patterns of a total order in the sequence Parallel episodesParallel episodes

- no constraints on the relative order- no constraints on the relative order Composite episodesComposite episodes

- serial combination of parallel episodes- serial combination of parallel episodes

Page 4: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Examples : episodesExamples : episodes

Page 5: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Algorithms (old)Algorithms (old)

Presented by Mannila et al.Presented by Mannila et al. Finding parallel and serial episodes tFinding parallel and serial episodes t

hat are frequent enough.hat are frequent enough. WINEPIWINEPI

- consider the - consider the supportsupport of an episode of an episode MINEPIMINEPI

- consider the number of - consider the number of minimal occurrencesminimal occurrences of an episodeof an episode

Page 6: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

WINEPIWINEPI

Consider the Sequence S=AConsider the Sequence S=A33AA44BB55BB66.. supportsupport : the number of sliding windo : the number of sliding windo

ws with width = ws with width = winwin.. Given Given winwin=3, there are six windows :=3, there are six windows :

WW11=A=A33, W, W22=A=A33AA44, W, W33=A=A33AA44BB55,,WW44=A=A44BB55BB66, W, W55=B=B55BB66, W, W66=B=B6 6 ..

<A,B> is supported by two windows.<A,B> is supported by two windows.

Page 7: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

MINEPIMINEPI

Consider the Sequence S=AConsider the Sequence S=A33AA44BB55BB66.. minimal occurrencesminimal occurrences : an interval that : an interval that

contains episode contains episode αα, but no proper su, but no proper sub-interval does.b-interval does.

<A> has <A> has momo support 2. support 2.- interval [3,3] and [4,4].- interval [3,3] and [4,4].

<A,B> has <A,B> has momo support 1. support 1.- interval [4,5].- interval [4,5].

Page 8: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Complex sequencesComplex sequences

Several events occurring at one Several events occurring at one timetime

Example :Example :

A temporal database is a complex A temporal database is a complex sequence with temporal attributes.sequence with temporal attributes.

AADD

BB AABBEE

CCEE

AABBFF

AACCEE

BBDDFF

DD

Page 9: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Algorithms (new)Algorithms (new)

Extend the algorithm to deal with coExtend the algorithm to deal with complex sequences.mplex sequences.

MINEPI+MINEPI+- depth-first enumeration to generate the frequent - depth-first enumeration to generate the frequent episodes by episodes by equalJoinequalJoin and and temporalJointemporalJoin..

EMMAEMMA- - EEpisodes pisodes MMining using ining using MMemory emory AAnchornchor- utilizes memory anchors to accelerate mining tas- utilizes memory anchors to accelerate mining taskk

Page 10: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

More about MINEPIMore about MINEPI

Breath-first mannerBreath-first manner- enumerate longer episodes from shorter ones- enumerate longer episodes from shorter ones

ParametersParameters- - maxwinmaxwin : maximum window width for an episode : maximum window width for an episode- - minsupminsup : minimal frequent for “frequent episod : minimal frequent for “frequent episode”e”

Temporal JoinTemporal Join- connects events from different time intervals- connects events from different time intervals

Page 11: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : MINEPIExample : MINEPI

S = AS = A11AA22BB33AA44BB55, , maxwinmaxwin=4, =4, minsupminsup=2=2 Find frequent 1-episode firstFind frequent 1-episode first

- - momo(A)={[1,1],[2,2],[4,4]}, (A)={[1,1],[2,2],[4,4]}, momo(B)={[3,3],[5,5]}(B)={[3,3],[5,5]} Temporal Join with Temporal Join with maxwinmaxwin=4=4

- possibles of <A,B> : [1,3],[2,3],[2,5],[4,5]- possibles of <A,B> : [1,3],[2,3],[2,5],[4,5]- mo(<A,B>)={[2,3],[4,5]} (choose minimal ones)- mo(<A,B>)={[2,3],[4,5]} (choose minimal ones)- support(<A,B>)={[- support(<A,B>)={[11,4],[,4],[22,5],[,5],[44,5]},5]}- support count = 3, counting distinct start point- support count = 3, counting distinct start point

Page 12: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

MINEPI+MINEPI+

Must deal with complex sequences.Must deal with complex sequences. Depth-first manner for memory savingDepth-first manner for memory saving Equal JoinEqual Join

- connects events at the same interval- connects events at the same interval Bound ListBound List

• For a serial episode P=<pFor a serial episode P=<p11,…,p,…,pkk>>- {[ts- {[tsii,te,teii] : S contains P in time [ts] : S contains P in time [tsii,te,teii]}]}

• For an event YFor an event Y- {[t- {[tii,t,tii] : S contains P in time t] : S contains P in time tii}}

Page 13: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : bound listExample : bound list

maxwinmaxwin = 4. = 4. Bound list of <A,B,C> : {[1,4],[3,6]}.Bound list of <A,B,C> : {[1,4],[3,6]}. Bound list of <C> : {[4,4],[6,6]}.Bound list of <C> : {[4,4],[6,6]}.

11 22 33 44 55 66 77 88

AADD

BB AABBEE

CCEE

AABBFF

AACCEE

BBDDFF

DD

Page 14: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

OperationsOperations

Given P=<pGiven P=<p11,…,p,…,pkk> and an event f.> and an event f.- P.boundlist = {[ts- P.boundlist = {[ts11,te,te11],…,[ts],…,[tsnn,te,tenn]}]}- f.boundlist = {[ts’- f.boundlist = {[ts’11,ts’,ts’11],…,[ts’],…,[ts’mm,ts’,ts’mm]}]}

Equal Join : PEqual Join : P11=P=P⊙⊙f=<pf=<p11,…,p,…,pkk∪∪f>.f>.- P- P11.boundlist are [ts.boundlist are [tsii,te,teii] such that] such that teteii=ts’=ts’j j for some j (1for some j (1≦≦jj≦≦m)m)

Temporal Join : PTemporal Join : P22=P=P .. f=<pf=<p11,…,p,…,pkk,f>.,f>.- P- P22.boundlist are [ts.boundlist are [tsii,ts’,ts’jj] such that] such that ts’ts’jj-ts-tsii<<maxwinmaxwin and ts’ and ts’jj>te>teii for some j (1 for some j (1≦≦jj≦≦m)m)

Page 15: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Drawbacks of MINEPI+Drawbacks of MINEPI+

Huge amount of combinationsHuge amount of combinations- Consider |I| 1-frequent episodes- Consider |I| 1-frequent episodes- O(|I|- O(|I|22) checking for temporal joins and equal joins) checking for temporal joins and equal joins

Unnecessary joinsUnnecessary joins- should skip temporal joins for a prefix if the numb- should skip temporal joins for a prefix if the numberer

of extendable matching bounds < of extendable matching bounds < minsup minsup × |TDB|× |TDB| Duplicate joinsDuplicate joins

- episode <ABC,ABC> need 4+1 joins :- episode <ABC,ABC> need 4+1 joins : <A>→<AB>→<ABC>→<ABC,A>→<ABC,AB>→<ABC,ABC><A>→<AB>→<ABC>→<ABC,A>→<ABC,AB>→<ABC,ABC>

Page 16: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

EMMAEMMA

Divide into three phasesDivide into three phases(I) Mining frequent itemset in the complex sequence.(I) Mining frequent itemset in the complex sequence.(II) Encode each frequent itemset with a unique ID,(II) Encode each frequent itemset with a unique ID,

and construct a encoded horizontal database.and construct a encoded horizontal database.(III) Mining episodes in the encoded database.(III) Mining episodes in the encoded database.

Depth-First SearchDepth-First Search Memory AnchorMemory Anchor

- utilize the boundlists to access information- utilize the boundlists to access information- timelists of frequent itemsets are their boundlists- timelists of frequent itemsets are their boundlists

Page 17: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : databaseExample : database

minsupminsup = 5 = 5

Page 18: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Combine episodesCombine episodes

Only combine existing episodes with Only combine existing episodes with a “local” frequent 1-tuple episode.a “local” frequent 1-tuple episode.- overcome the huge amount of generations- overcome the huge amount of generations

Projected boundlist (PBL)Projected boundlist (PBL)- episode #3=<C> has boundlist- episode #3=<C> has boundlist {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]}{[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]}- given - given maxwinmaxwin = 4, the projected boundlist is = 4, the projected boundlist is {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}{[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}- note that |TDB|=16- note that |TDB|=16

Page 19: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : PBL Example : PBL

#3.timelist={1,2,4,8,11,14,15}.#3.timelist={1,2,4,8,11,14,15}.1 → [2,4]1 → [2,4]2 → [3,5]2 → [3,5]4 → [5,7]4 → [5,7]8 → [9,11]8 → [9,11]11 → [12,14]11 → [12,14]14 → [15,16]14 → [15,16]15 → [16,16]15 → [16,16]

with with maxwinmaxwin = 4 and |TDB|=16. = 4 and |TDB|=16.

Page 20: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Local frequent IDLocal frequent ID

A local frequent ID has boundlist that caA local frequent ID has boundlist that can match into other episode’s PBL.n match into other episode’s PBL.- - #3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}#3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}- #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}- #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}

Record boundlist of ID when examining.Record boundlist of ID when examining.- get the boundlist immediately at temporal join- get the boundlist immediately at temporal join- <C,D>=<#3,#4> then <C,D>.boundlist =- <C,D>=<#3,#4> then <C,D>.boundlist = {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}{[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

Page 21: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : temporal Example : temporal joinjoin #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}.#4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. Recall the construction of #3.PBLRecall the construction of #3.PBL

11 → [2,4] : → [2,4] : [3,3][3,3] in it in it22 → [3,5] : → [3,5] : [3,3][3,3] in it (take minimal) in it (take minimal)44 → [5,7] : → [5,7] : [5,5][5,5] in it in it88 → [9,11] : → [9,11] : [9,9][9,9] in it in it1111 → [12,14] : → [12,14] : [12,12][12,12] in it in it1414 → [15,16] : → [15,16] : [16,16][16,16] in it in it1515 → [16,16] : → [16,16] : [16,16][16,16] in it in it

Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

Page 22: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Procedure : emmajoinProcedure : emmajoin

Recursively extend the episodesRecursively extend the episodes- until no more serial episodes can be extended- until no more serial episodes can be extended

Avoid unnecessary checking in MINEPI+Avoid unnecessary checking in MINEPI+- stop when the number of extendable bounds for a- stop when the number of extendable bounds for a serial episode is less than serial episode is less than minsup minsup × |TDB|.× |TDB|.

Example : #2=<B>.Example : #2=<B>.- #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}- #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}- #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16)- #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16)- do not need to extend #2 if - do not need to extend #2 if minsupminsup = 5 = 5

Page 23: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : emmajoinExample : emmajoin

#3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}.#3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}.#7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}.#9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. Call Call emmajoinemmajoin to extend each 1-tuple episodes to extend each 1-tuple episodes #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}.#3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. Find local frequent IDs in #3.PBL.Find local frequent IDs in #3.PBL.

Page 24: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : emmajoin (cont.)Example : emmajoin (cont.)

minsupminsup = 5, = 5, maxwinmaxwin = 4. = 4. By temporal Join :By temporal Join :

- <#3,#3>.BL={- <#3,#3>.BL={[1,4],[8,11],[11,14],[14,15]}}- <#3,#7>.BL={- <#3,#7>.BL={[1,4],[8,11],[11,14]}}- <#3,#9>.BL={[1,3],[4,6],[8,9],[11,12],[14,16]}- <#3,#9>.BL={[1,3],[4,6],[8,9],[11,12],[14,16]}- <#3,#9> is generated from prefix #3- <#3,#9> is generated from prefix #3- recursively call - recursively call emmajoinemmajoin to extend<#3,#9> to extend<#3,#9>- <#3,#9>.PBL={[4,4],[7,7],[10,11],[13,14]}- <#3,#9>.PBL={[4,4],[7,7],[10,11],[13,14]}- there are no local frequent IDs since - there are no local frequent IDs since minsupminsup=5=5

Back to call Back to call emmajoinemmajoin for episode #7. for episode #7.

Page 25: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

ExperimentsExperiments

On a dataset composed of 10 stocks.On a dataset composed of 10 stocks. Parameters : Parameters : maxwinmaxwin//minsup.minsup.

- more running time when - more running time when maxwin maxwin increasesincreases- more running time when - more running time when minsup minsup decreasesdecreases- since the number of frequent episodes increases- since the number of frequent episodes increases

EMMA runs faster than MINEPI+.EMMA runs faster than MINEPI+. MINEPI+ uses lesser space than EMMA.MINEPI+ uses lesser space than EMMA.

- EMMA needs large memory as - EMMA needs large memory as minsup minsup decreasesdecreases

Page 26: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

ConclusionConclusion

Modify MINEPI to MINEPI+Modify MINEPI to MINEPI+- for mining episodes in a complex sequence- for mining episodes in a complex sequence

Propose EMMAPropose EMMA- avoid the drawbacks of MINEPI+- avoid the drawbacks of MINEPI+

EMMA is more efficient than MINEPI+.EMMA is more efficient than MINEPI+. Future workFuture work

- only discussed serial episodes- only discussed serial episodes- parallel and composite episodes remain to be solved- parallel and composite episodes remain to be solved