mining association rules - datamining lab. at dongguk...

19
Mining Association rules () Department of Statistics and Information Science Dongguk University E-mail:[email protected] 2008 9 0-0

Upload: ngotram

Post on 02-Apr-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

Mining Association rules

(연관성분석)

김진석

Department of Statistics and Information Science

Dongguk University

E-mail:[email protected]

2008년 9월

0-0

Page 2: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

차례

제 1 절 What is the association rules 0-21.1 horizontal vs. vertical layout; (Zaki, 2000)) . . . . . . . 0-51.2 Making R transaction object . . . . . . . . . . . . . . . . 0-6

제 2 절 규칙의 평가 혹은 선택 0-10

제 3 절 R 예제 0-11

제 4 절 Read the external transaction data 0-154.1 Generating Random Transactions : basket format . . . 0-154.2 Read the transaction data : basket and single format . 0-164.3 Real data sets for association rule mining . . . . . . . . 0-18

0-1

Page 3: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

• Association rules (연관성분석)

• Market basket analysis(장바구니분석)

• 관련된 R package : arules

제 1절 What is the association rules

• Item Set (아이템집합, 상품집합): 전체 상품(I) 중에서 가능한

부분집합, Xt, t = 1, . . . , 2|I|

• The set of item sets (아이템집합의 집합): 상품의 부분집합들로구성된 집합, X = {X1, . . . , Xk}

• Association Rule(연관성규칙): 특정 아이템집합을 구매하였을 때,

0-2

Page 4: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

또 다른 아이템 집합을 구매하는 규칙,

∃X, Y ∈ X : X ⇒ Y.

수퍼마켓의 예 ;{milk, bread} ⇒ {butter}

해석 : milk 와 bread 를 구매한 고객은 butter를 동시에 구매한다

그러면 여기서 문제는

• 많은 규칙을 구성하는 아이템집합의 수가 너무 많다, (상품의 수

가 N이면 아이템집합의 개수는 ??)

2N ≈ O(10N/3)

0-3

Page 5: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

• 어떻게 연관성 규칙을 찾아낼까 ? 특히 상품의 수가 많은 경우

• 많은 규칙들 중에서 어떻게 하면 우리에게 유용한 규칙을 발견할수 있을까 ? 빠른 알고리즘.

• 많은 규칙들 중에서 어떤 규칙들이 우리가 원하는 규칙인가 ? : 규칙의 평가 방법

0-4

Page 6: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

1.1 horizontal vs. vertical layout; (Zaki, 2000))

상품

transaction 우유 빵 버터 맥주

1 1 1 0 02 0 1 1 03 0 0 0 14 1 1 1 05 0 1 1 0

transaction 구매상품

1 우유,빵2 빵,버터3 맥주

4 우유,버터,맥주5 빵,버터

거래번호

우유 1,4상 빵 1,2,4,5품 버터 2,4

맥주 3

0-5

Page 7: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

1.2 Making R transaction object

• example: 리스트 형태의 자료를 ”transactions” 자료로 변환

a_list <- list(

c("a","b","c"),

c("a","b"),

c("a","b","d"),

c("c","e"),

c("a","b","d","e")

)

## set transaction names

names(a_list) <- paste("Tr",c(1:5), sep = "")

a_list

0-6

Page 8: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

## coerce into transactions

trans <- as(a_list, "transactions")

## analyze transactions

summary(trans)

image(trans)

• example: matrix 형태의 자료를 ”transactions” 자료로 변환

a_matrix <- matrix(

c(1,1,1,0,0,

1,1,0,0,0,

1,1,0,1,0,

0,0,1,0,1,

1,1,0,1,1), ncol = 5)

0-7

Page 9: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

## set dim names

dimnames(a_matrix) <- list(

c("a","b","c","d","e"),

paste("Tr",c(1:5), sep = ""))

a_matrix

## coerce

trans2 <- as(a_matrix, "transactions")

trans2

• example 3: data.frame 형태의 자료를 ”transactions” 자료로 변환

a_data.frame <- data.frame(

age = as.factor(c(6,8,7,6,9,5)),

0-8

Page 10: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

grade = as.factor(c(1,3,1,1,4,1)))

## note: all attributes have to be factors

a_data.frame

## coerce

trans3 <- as(a_data.frame, "transactions")

image(trans3)

## 3. example creating from data.frame with NA

a_df <- sample(c(LETTERS[1:5], NA),10,TRUE)

a_df <- data.frame(X = a_df, Y = sample(a_df))

a_df

trans3 <- as(a_df, "transactions")

0-9

Page 11: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

trans3

as(trans3, "data.frame")

제 2절 규칙의평가혹은선택

• Support(지지도):

supp(A) =A를 구매한 건수

전체 구매 건 수

orsupp(A) =

#{A ∈ t, t ∈ T }|T |

= PT (A)

0-10

Page 12: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

• Confidence(신뢰도)

conf(A⇒ B) =A, B를 동시에 구매한 건수

A를 구매한 건수= PT (B|A)

• Lift(향상도):

lift(A⇒ B) =A, B를 동시에 구매한 건수

A를 구매한 건수

/B를 구매한 건수

lift(A⇒ B) =PT (B|A)PT (B)

제 3절 R예제

Data Sets

0-11

Page 13: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

• Groceries 1 month (30 days) of real-world PoS(point-of-sale)transaction data from a typical local grocery outlet.The data set contains 9835 transactions and the items are ag-gregated to 169 categories.Michael Hahsler, Kurt Hornik, and Thomas Reutterer (2006)Implications of probabilistic data modeling for mining asso-ciation rules. In M. Spiliopoulou, R. Kruse, C. Borgelt, A.Nuernberger, and W. Gaul, editors, From Data and Informa-tion Analysis to Knowledge Engineering, Studies in Classifica-tion, Data Analysis, and Knowledge Organization, pages 598?605.Springer-Verlag.

• IncomeESL: http://www-stat.stanford.edu/~tibs/ElemStatLearn

• AdultUCI : http://www.census.gov/ftp/pub/DES/www/welcome.

0-12

Page 14: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

html, 또는http://www.ics.uci.edu/~mlearn/MLRepository.html

• Examples 데이터

data("Adult")

• Mine association rules: APRIORI알고리즘을 이용한 연관규칙의탐색. 지지도 ≥ 0.5, 신뢰도 0.9 이상인 규칙들만 탐색함

rules <-apriori(Adult,

parameter = list(supp = 0.5, conf = 0.9, target = "rules"))

• 요약함수

summary(rules)

0-13

Page 15: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

• 지지도가 0.4 이상만 추출 할 경우

rules <- apriori(Adult, parameter = list(support = 0.4))

• 좌측 아이템집합에 ”sex”가 들어 있고 규칙의 향상도가 0.3이상인 규칙만을 선택

rules.sub <- subset(rules, subset = rhs %pin% "sex" & lift > 1.3)

• 선택된 규칙들을 보여줌

inspect(SORT(rules.sub)[1:3])

0-14

Page 16: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

제 4절 Read the external transaction

data

4.1 Generating Random Transactions : bas-

ket format

RandomTransactions<-function(file, nItem=200, nTrans=100)

{

cat("", file=file, sep=",")

i<-1

while(1)

{

iProb <- runif(nItem)/20

sel<-seq(1,nItem)[runif(nItem)<iProb]

0-15

Page 17: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

if(length(sel) < 1) {cat(i,"\n");}

else {

items<-paste("I-", seq(1,nItem)[runif(nItem)<iProb], sep="")

cat(items, file=file, sep=",", append=T)

cat("\n", file=file, append=T)

i<-i+1

}

if(i>nTrans) break;

}

}

4.2 Read the transaction data : basket and

single format

create a demo file using basket format for the example

0-16

Page 18: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

data <-paste("item1,item2","item1","item2,item3", sep="\n")

cat(data)

write(data, file = "demo_basket")

read demo data

tr <- read.transactions("demo_basket", format ="basket", sep=",")

inspect(tr)

create a demo file using single format for the example column 1contains the transaction ID and column 2 contains one item

data <- paste("trans1 item1", "trans2 item1","trans2 item2",

sep ="\n")

data<-data.frame(tr=rep(1:),

cat(data) write(data, file ="demo_single")

read demo data

0-17

Page 19: Mining Association rules - Datamining Lab. at Dongguk …datamining.dongguk.ac.kr/lectures/2009-2/dm/dm_asso.… ·  · 2011-01-06Mining Association rules (연관성분석) 김진석

tr <- read.transactions("demo_single.txt",

format = "single", cols = c(1,2))

inspect(tr)

4.3 Real data sets for association rule min-

ing

http://cobweb.ecn.purdue.edu/KDDCUP/KDD-CUP 2000For downloading data sets, you should be registered to get ID andpassword.

0-18