第七章 網路資料庫之關連法則探勘

Download 第七章  網路資料庫之關連法則探勘

If you can't read please download the document

Upload: zahur

Post on 07-Jan-2016

63 views

Category:

Documents


3 download

DESCRIPTION

第七章 網路資料庫之關連法則探勘. 內容概要. 簡介 關連法則探勘 (Association Rule Mining) 多層次關連法則探勘 (Multilevel Association Rule Mining) 數量化關連法則探勘 (Quantitative Association Rule Mining) 關連分析 (Correlation Analysis) 總結. 簡介 (1). 單一購物車告訴我們個別顧客的消費行為,但是累積大量的購物車資料之後,可以分析整體顧客的消費習慣。 - PowerPoint PPT Presentation

TRANSCRIPT

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • (1)IBM PC ViewSonic

  • (2)80%

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • 7-1

  • (1) (itemset)XXTT (support) X

  • (2)X (support count) XX (support) X7-1 2 5 5/10=0.5{2,5} 3 3/10=0.3 X Y [,] X Y X Y X Y

  • (3)X Y (confidence)

  • (4) (minimum support) (minimum confidence) (minimum support count)7-10.20.5 {1,3} {5}20.2{1,3}0.3 {1,3} {5}0.2/0.3=0.67

  • (5) (large itemset)Z XY

  • (6)7-10.20.7{1,3}{1}{3}{3}{1}{1}{3}0.3/0.4=0.75{3}{1}0.3/0.5=0.6{1}{3}

  • Apriori k k- (k-itemset)Lkk- (large k-itemset) Apriori1- L1L1 L2L2L3

  • Apriori Apriori Apriori {A,B}{A,B}{A}{B}{A}{A,B}{A}{B}

  • Apriori Apriori (candidate itemsets) (join) (prune)

  • (k-1)-k- (candidate k-itemsets)Ckk-X1X2(k-1)-Xi[j]XijX1X2k-2X1[k-1]
  • 7-1X1X23-X1={1,3,5}X2={1,3,6}X1[1]=X2[1]=1X1[2]=X2[2]=3X1[3]
  • Apriori k-CkLkLk XCkApriori XX1 Apriori CkXk-1(k-1)-X k-XCk

  • 7-2X1X23-X1={1,3,5}X2={1,3,6}X1X24-{1,3,5,6}Apriori 4-{1,3,5,6}{1,3,5,6}3{1,3,5}{1,3,6}{1,5,6}{3,5,6}{1,3,5}{1,3,6}3-{1,5,6}{3,5,6}3-{1,3,5,6}4-{1,5,6}{3,5,6}3-{1,3,5,6}4-

  • Apriori 1 L1 = 1-;2 for (k = 2; Lk-1; k++) do begin3 Ck = Candidate_gen (Lk-1) 4 for each t 5 Ckctc c1 6 Lk = Ck 7 end8 return L =

  • Candidate_gen Procedure1for each X1 Lk-1 /* X1[1],X1[2], , X1[k-1]X1 k-1*/2 for each X2 Lk-1 /* X2[1],X2[2], , X2[k-1]X2 k-1*/3 if (X1[1]=X2[1]) (X1[2]=X2[2]) (X1[k-2]=X2[k-2]) (X1[k-1]
  • 7-3 (1)7-1Apriori3 1- C1

  • 7-3 (2) L1

  • 7-3 (3)L1C2

  • 7-3 (4)2-

  • 7-3 (5) L2

  • 7-3 (6)L2C3

    3-

  • 7-3 (7){{1},{2},{3},{4},{5},{6},{1,3},{1,5},{2,5},{3,5}}0.7 {1} {3} =3/4=0.75 {3} {1} =3/5=0.6 {1} {5} =3/4=0.75 {5} {1} =3/6=0.5 {2} {5} =3/5=0.6 {5} {2} =3/6=0.5 {3} {5} =3/5=0.6 {5} {3} =3/6=0.5 13

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • 80%PC70%IBM PCViewSonic (lower concept level)

  • 7-5 IBM COMPAQ ASUS HP IBM Acer IBM Acer Toshiba

  • 7-14 CRT LCD 17 19 15 17

  • 7-15 A4 A3+

  • (lower) (higher) HP ViewSonic=0.01 =0.95

  • ViewSonic=0.7=0.9 (multilevel association rules)

  • (top-down) 1 (level-1) 2 (level-2) Apriori

  • (1)ixxi-1x

  • (2)ix i-1 1-x [=0.2]

    ()

    ()

    = 0.25

  • (1) [=0.2]

    [=0.12]

    [=0.08]

    = 0.3

    = 0.06

  • (2)k-ik-X i-1 k-k-X {,LCD}[ = 0.2]

    {,15LCD}[= 0.12]

    {,17LCD}[= 0.02]

    {,15LCD}[= 0.03]

    {,17LCD}[= 0.03]

    = 0.15 = 0.03

  • IBM 1122 1 1 1 2 2 3 2 4 IBM

  • 7-2

  • 7-3

  • 7-4T[1]

  • TT[1]L[j,k]jk-LL[j]jminsup[j]j

  • 7-4 1 2 3 4 5 T[1](7-4)7-3 1600 7-2IBM 1111

  • (1)1for (j=1L[j,1] and jj++) do begin /* 1 */2 if j=1 then {3 L[j,1] = Large_item_gen(T[1],j) /* T[1]1 1- */4 T[2] = Filtered_table(T[1],L[1,1]) /* L[1,1]T[1] */5 }6 else L[j,1] = Large_item_gen(T[2],j)

  • (2)7 for (k = 2;L[j,k-1]; k++) do begin /* j k- */8 Ck = Candidate_gen(L[j,k-1])9 for each T[2]t 10 Ckctcc 111 L[j,k] = Ckminsup[j]k- 12 end13 return LL[j] = j 14end

  • (3)3Large_item_gen (T[1],j) T[1]j1-L[j,1]1Large_item_gen(T[1],1) 1-L[1,1]6j(j>1)Large_item_gen(T[2],j) 1-L[j,1]L[j-1,1]L[j,1]2 11** 1-3 (111*) (112*)

  • (4)4Filtered_table(T[1],L[1,1]) L[1,1] T[1] ttttT[1]Filtered _table(T[1],L[1,1]) T[2]

  • 7-57-4114T[1]11-L[1,1]{4***} 235 8 4 * Filtered_table L[1,1]T[1]T[2]T[1] 2 3214 4 9 10

  • 7-6 (1)7-511-L[1,1]

  • 7-6 (2)T[1]T[2]

  • 7-6 (3)Candidate_gen12-14L[1,1]C2={{1***,2***}, {1***,4***}, {2***,4***}}{2***,4***}3L[1,2]={{1***,2***},{1***,4***}}L[1,2]C3={{1***,2***,4***}}{1***,2***,4***}3L[1,3]=

  • 7-6 (4)12-L[1,2]

  • 7-6 (5)222T[2]21-L[2,1]{41**} 2 3 8321-Candidate_genL[2,1]C2 = {{11**,12**}, {11**,21**}, {11**,22**}, {11**,41**}, {12**,21**}, {12**,22**}, {12**,41**}, {21**,22**}, {21**,41**}, {22**,41**}}2L[2,2] = {{11**,41**},{12**,21**},{12**,22**}}L[2,2]C3={{12**,21**,22**}}{12**,21**,22**}0L[2,3]=

  • 7-6 (6)2 L[2,1] L[2,2]

  • 7-6 (7)333T[2]31-L[3,1]Candidate_genL[3,1]C2={{111*,121*},{111*,211*},{111*,411*},{121*,211*},{121*,411*},{211*,411*}}3L[3,2] = {{121*,211*}}L[3,2]3-L[3,3]=

  • 7-6 (8)3L[3,1] L[3,2]

  • 7-6 (9)442T[2]41-L[4,1]Candidate_genL[4,1]C22L[4,2] ={{1212,2112}}L[4,2]3-L[4,3]=

  • 7-6 (10)4 L[4,1] L[4,2]

  • 7-7 (1)7-612340.80.70.70.6 1{1***} {2***} =7/8=0.875{2***} {1***} =7/7=1{1***} {4***} =4/8=0.5{4***} {1***} =4/4=1 124

  • 7-7 (2) 2{11**} {41**} =2/3=0.67{41**} {11**} =2/3=0.67{12**} {21**} =3/5=0.6{21**} {12**} =3/4=0.75{12**} {22**} =2/5=0.4{22**} {12**} =2/3=0.67 4

  • 7-7 (3) 3{121*} {211*} =3/3=1{211*} {121*} =3/4=0.75 12 4{1212} {2112} =2/2=1{2112} {1212} =2/3=0.67 12

  • 7-7 (4) 1 (7-5) 2 (7-14) 4 (7-15) =0.875 =1 =1 CRT =0.75 17CRT = 1 17CRT = 0.75 IBM 17CRT = 1 17CRT IBM = 0.67

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • (1) 40% (quantitative association rule)

  • (2) (intervals)

  • q_ (q_item) q_ i qq_ q_ (q_itemset) q_ x q_x

  • q_ q_ q_

  • (1)i q_ , , ... , , ... q_

  • (2) T s 1 2 3 4

  • 7-8{,,,,}iq_q_5030100204050010%[1][2..3][4..5]123q_ ( )

  • (1)Xq_Xq_ttq_Xq_X q_X q_q_ (large q_itemset)kq_k-q_ (k-q_itemset)

  • (2) X Y [, ] X Y q_ Z q_XY

  • ()q_(LqiTid(large q_itemset generation using Tids))

  • 7-6DB

  • 7-6DB37-17DBDB7-7

  • 7-17 ABCDEFG

  • 7-7DB

  • q_(1)TS({x}) q_x (Tids) DBTS ({}) = {5,12,14} TS ({}) = {1,4,5,8}TS ({x1,x2}) q_x1x2TS ({x1}) TS ({x2}) TS ({x1,x2}) = TS ({x1}) TS ({x2}) TS ({,}) = TS ({}) TS ({}) ={5}

  • q_(2) x1,x2,...,xk q_TS ({x1,x2,...,xk}) q_{x1,x2,...,xk}SP ({x1,x2,...,xk}) TS ({x1,x2,...,xk}) : SP ({x1,x2,...,xk}) = Card (TS ({x1,x2,...,xk})) = Card (TS ({x1}) TS ({x2}) TS ({xk})) Card(S) S

  • 7-8q_ 7-7DBq_

  • q_(3)LqiTidq_SP({x1,x2,...,xk}) {x1,x2,...,xk} k-q_ q_{x1,x2,...,xk} k-q_q_ Candidate_gen(k-1)-q_k-q_ (candidate k-q_itemset)k-q_

  • q_(4)x[1]x[2]x[k-1](k-1)-q_ x k-1 q_Lkk-q_item(x[j]) q_x[j] q_{x[1],x[2],...,x[k-1]} item(x[1])
  • LqiTid LqiTid :q_TSSP1-q_q_SP1-q_k-q_k-q_CkTSSPk-q_

  • LqiTid 1 q_x TS({x}) SP({x}) /* */2L1={x | x q_ SP({x}) } /* 1-q_ */ 3for (k=2; |Lk-1| > 1; k++) do begin /* k-q_ */4 Lk-1k-q_Ck 5 for each q_c Ck do begin /* c (k-1)-q_ S1 S2 */ 6 TS(c)=TS(S1)TS(S2) SP(c)=Card(TS(c)) 7 If SP(c) then 8 Lk = Lk {c} 9 end 10end

  • 7-1027-81-q_7-9 2-q_7-9q_C22-q_7-102-q_ 3-q_7-10q_C33-q_7-113-q_C4=L4=

  • 7-91-q_

  • 7-102-q_

  • 7-113-q_

  • 7-117-100.657-17

    {} {} =2/3=0.67 {} {} =3/3=1 {} {C,[1..2]} =2/3=0.67 {} {} =2/2=1 {,} {} =2/3=0.67 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/3=0.67

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • (1) 10000 60007500400030%60% [=40%, =67%] 75%67%

  • (2)P(AB) = P(A) P(B)AB (independent)AB (dependent and correlated)AB (correlation)

  • (3)correlation < 1 A B (negatively correlated) A B correlation > 1 A B (positively correlated) A B correlation = 1 A B 1 1

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • Apriori (hash) (cache)