prefixspan﹕ mining sequential patterns efficiently by prefix-projected pattern growth

Click here to load reader

Upload: brent-castillo

Post on 03-Jan-2016

67 views

Category:

Documents


1 download

DESCRIPTION

PrefixSpan﹕ Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. 學生 : 907737 張資昊 907747 蔡明成 指導老師 : 劉俞志. 名詞解釋. items : 在顧客交易資料庫中的一種產品,稱之為一個 item 。 itemset : 由一個以上的 items 所組成的一個非空集合,其中表示為一個 item 。 - PowerPoint PPT Presentation

TRANSCRIPT

  • PrefixSpanMining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth : 907737 907747 :

  • items itemitemset itemsitemsequence and element itemsetitemsetelementlengthsequenceiteml - sequencel sequence

  • ()subsequence and super sequencesequence = aaa, sequence = bbb1j
  • ()(frequent) sequential patternsequence databasesequence support() (a positive integer as the support threshold)(frequent) sequential patternsl - patternlsequential patternsupportsequence sequence database Ssupportdatabasecontaining tuplesupport()

  • Sequential pattern miningApriorisequential patternPrefixSpan - (Prefix-projected Sequential Pattern)candidate subsequenceprojected databasesequential patternApriori-based GSPFreeSpan

  • GSP GSPApriori-like mothodmultiple-passcandidate-generation-and-test sequential pattern mininglength-1frequent sequenceseed set(seed setsequential pattern)Step1(generate)seed setsequencelength1candidate sequencesStep2(test)candidate sequence supportmin_supportcandidate sequenceseed setStepcandidatemin_supportseed setsequential pattern

  • GSP()

  • GSP()Examplelength-1frequent sequenceseed set(seed set = )length-2candidate..candidatemin_supportseed setlength-3candidateseed setsequential patterTable1.A sequence database

  • GSP() Apriori-like candidate sequence (1000length-1frequent sequence1,499,500)sequential pattern

  • FreeSpan frequent item(project)projected databaseprojected databaseTable1

  • FreeSpan()

    frequent itemf_list = {a4, b4, c4, d3, e3, f3}f_list66projected database-projected database, -projected database, ..... , -projected database-projected databaseitem xf_listitem xitem-projected database4projected databaseitem xf_listitem xitem-projected database

  • FreeSpan()4 projected database

    FreeSpanprojected databaseFreeSpanGSPprojected database-projected database

  • Mining sequential patterns by prefix projectionssequenceelementitemselementitemsa(bac)(ca)d(fc)a(abc)(ac)d(cf)sequence

  • Mining sequential patterns by prefix projections ()

  • Mining sequential patterns by prefix projections ()ExampleExample (PrefixSpan) Table 1sequence database Smin_sup2prefix-projection methodminingStep 1length-1sequential patternsSlength-1sequential patterna4, b4, c4, d3, e3, f3(patternsupport)Step 2sequential patterns(1)prefixa(6)prefixfStep 3sequential patternssubsets projected databasesminingTable 2

  • Mining sequential patterns by prefix projections ()Example(1) prefixasequential patternsaasubsequenceprojected databasesequence (ef)(ab)(df)cb(_b)(df)cbsequential patterns(_b)aelementbsequence a(abc)(ac)d(cf)(abc)(ac)d(cf)subsequence

  • Mining sequential patterns by prefix projections ()Example(2) (1)sequence database Ssequencepostfix sequencesa-projected database(abc)(ac)d(cf)(_d)c(bc)(ae)(_b)(df)cb(_f)cbca-projected databaseprefixalength-2sequential patternsaa2, ab4,(ab)2,ac4, ad2,af2sequential patterns(1)prefixaa(2)prefixab(6)prefixafprojected database

  • Mining sequential patterns by prefix projections ()Example(3) prefixprojected databasemin_sup(postfix) subsequencessequential patternsaa-projected databasesubsequence (_bc)(ac)d(cf)

  • Mining sequential patterns by prefix projections ()Example

  • PrefixSpanAlgorithm and correctness Lemma3.1PrefixSpan(recursive)

  • PrefixSpanAlgorithm and correctness()-projected database

  • PrefixSpanAlgorithm and correctness()

  • PrefixSpanAlgorithm and correctness()-projected database

  • PrefixSpanAlgorithm and correctness()

  • Scaling up pattern growth by bi-level projection PrefixSpanprojected databasesbi-level projectionprojected databaseExample4 Step13.2level-by-level projectionSlength-1 sequential pattern,,,,,.Step266matrixprojected database-Table3

  • Scaling up pattern growth by bi-level projection()M[c,c]=3SM[a,c]=(4,2,1)=4, =2=1

  • Scaling up pattern growth by bi-level projection()length-2sequential pattern-projected database-projected databasesequences,,frequent items33 S-matrix-projected databaseTable4

  • Scaling up pattern growth by bi-level projection()sequential pattern(support=2)projection(sequencepattern)bi-levellevel-by-levelExample3level-by-level53projected databasebi-level22(length-2sequential pattern)

  • Scaling up pattern growth by bi-level projection()S-matrixitem

  • Scaling up pattern growth by bi-level projection()S-matrixitem

  • Pseudo-Projection PrefixSpanprojected databasepseudo-projection techniquesequencepointeroffsetpostfix subsequences

  • Pseudo-Projection()a-projected databases = a(abc)(ac)d(cf)postfix sequence (abc)(ac)d(cf)spointeroffset = 2databasemain memorydisk-base

  • Experimental Results and Performance Study 233MHz Pentium PC machine with 128 megabytes main memoryrunning Microsoft Windows/NT. All the method using Microsoft Visual C++ 6.04GSP.FreeSpan. FreeSpan with alternative level projected.PrefixSpan-1. PrefixSpan with level-by-level projected.PrefixSpan-2. PrefixSpan with bi-level projected.

  • Experimental Results and Performance Study()thresholdsequential patternsrunning timethresholdDataset C10T8S8I8item1,000sequence10,000element8items(T8)sequence8sequences(S8)

  • Experimental Results and Performance Study()pseudo-projectionspseudothreshold

  • Experimental Results and Performance Study()datasetC1kT8S8I8 item1,000sequence1,000,000element8items(T8)sequence8sequences(S8))pseudopseudo(sequenceI/O Cost)thresholdbi-levelLevel-by-level

  • Experimental Results and Performance Study()Threshold20%sequencerunning timePrefixSpan-2PrefixSpan-1

  • Experimental Results and Performance Study()thresholdPrefixSpanFreeSpanGSPFreeSpanGSPPrefixSpan-2bi-levelProjectionlow thresholdprojectionPrefixSpan-1databasemain memorypseudo

  • PrefixSpanFreeSpanpatternfrequent itemprojected databasePrefixSpanFreeSpanprojected databaseApriori PrefixSpanbi-level projection(3-way checking)

  • sequential mining methodPrefixSpanbi-levelpseudo-projectionApriori-like

  • Q & A